I'm performing data validation in Python using the Pandas module. I have two datasets to compare source and target data for expected values. I've successfully merged two dataframes using pd.merge and need to identify the columns causing the merge to be left or right only.
Obviously I can find the rows not matching with ['_merge'] != 'both', but is there a way to output the names or positions of the columns that fit the != 'both' condition? That way I wouldn't have to sort through the row to find which column is not working as expected?
For example, let's say these are the two dataframes:
SOURCE
| ID | First Name | Last Name |
|---|---|---|
| 001 | John | Doe |
| 002 | Roger | Smith |
| 003 | Maggie | Adams |
TARGET
| ID | First Name | Last Name |
|---|---|---|
| A001 | John | Doe |
| A002 | Roger | Smith |
| A003 | Maggie | Adams |
Expected output: ID
In this scenario, the _merge value != 'both' due to the values in the ID column not matching. What command will give me either the position or name of the ID column in either dataframe?
If possible, I would also like to know how to find exact position (row and column) of mismatching values.
indicatorparameter ofpandas.merge()pandas.merge()?