0

I'm a beginner in Python. I have two dataframes, each with 5 columns but only the first two columns from each dataframe have matching data. Each dataframe have different number of records. I would like to compare column A from df1 against column A from df2 and if they match, then output column D (ownerEmail) from df2. If columns A don't match, column D should be null.

df1

subscriptionId | displayName | state   | authorization | tenantId
12345          | DEV_SPS     | Enabled | RoleBased     | 938c49a8
67890          | PROD_LINUX  | Enabled | RoleBased     | 0a9cb9ee
11900          | TST_WIN     | Enabled | RoleBased     | e1513511

df2

subscriptionId | SubName    | Connected | ownerEmail         | organization
12345          | DEV_SPS    | Enabled   | [email protected] | Marketing
67890          | PROD_LINUX | Enabled   | [email protected] | Sales

Desired output

subscriptionId | displayName | state   | authorization | tenantId | ownerEmail       
123456         | DEV_SPS     | Enabled | RoleBased     | 938c49a8 | [email protected]
67890          | PROD_LINUX  | Enabled | RoleBased     | 0a9cb9ee | [email protected]
11900          | TST_WIN     | Enabled | RoleBased     | e1513511 | null

I have tried something like this but it didn't work.

df1['ownerEmail'] = np.where(df1['subscriptionId'] == df2['subscriptionId'], ['ownerEmail'], "")
print(df1)

Any help would be much appreciated.

Thank you.

1 Answer 1

1

Merge your dataframes on subscriptionId column and keep all records from df1 (how='left'):

>>> pd.merge(df1.astype({'subscriptionId': str}),
             df2[['subscriptionId', 'ownerEmail']].astype({'subscriptionId': str}),
             on='subscriptionId', how='left')

   subscriptionId displayName    state authorization  tenantId          ownerEmail
0           12345     DEV_SPS  Enabled     RoleBased  938c49a8  [email protected]
1           67890  PROD_LINUX  Enabled     RoleBased  0a9cb9ee  [email protected]
2           11900     TST_WIN  Enabled     RoleBased  e1513511                 NaN
Sign up to request clarification or add additional context in comments.

3 Comments

Thanks. I tried that but not working. df1.merge(df2[['subscriptionId', 'ownerEmail']], on='subscriptionId', how='left') It throws an error "You are trying to merge on object and float64 columns. If you wish to proceed you should use pd.concat"
Ok, there's some progress as there's no error. But the ownerEmail column shows NaN for all records. Here's what I tried: df3 = pd.merge(df1.astype({'subscriptionId': str}), df2[['subscriptionId', 'ownerEmail']].astype({'subscriptionId': str}), on='subscriptionId', how='left') print(df3)
Ok so I modified my dataframe slightly and followed the code that you've posted. It works! Would like to express my gratitude for your help. Thank you very much :)

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.