How to find columns not matching in Pandas Merge?

Question

I'm performing data validation in Python using the Pandas module. I have two datasets to compare source and target data for expected values. I've successfully merged two dataframes using pd.merge and need to identify the columns causing the merge to be left or right only.

Obviously I can find the rows not matching with ['_merge'] != 'both', but is there a way to output the names or positions of the columns that fit the != 'both' condition? That way I wouldn't have to sort through the row to find which column is not working as expected?

For example, let's say these are the two dataframes:

SOURCE

ID	First Name	Last Name
001	John	Doe
002	Roger	Smith
003	Maggie	Adams

TARGET

ID	First Name	Last Name
A001	John	Doe
A002	Roger	Smith
A003	Maggie	Adams

Expected output: ID

In this scenario, the _merge value != 'both' due to the values in the ID column not matching. What command will give me either the position or name of the ID column in either dataframe?

If possible, I would also like to know how to find exact position (row and column) of mismatching values.

The question likely refers to the indicator parameter of pandas.merge() — JonSG
– JonSG, Commented Sep 10 at 18:46
Can you provide the actual parameters you are passing to pandas.merge()? — JonSG
– JonSG, Commented Sep 10 at 19:08
These are the parameters passed to pandas.merge(): dataframes_compare = pd.merge( SOURCE , TARGET , how = 'outer' # full outer join , indicator = True # adds column to output called _merge with source for each row ) — Cassidy Alexander
– Cassidy Alexander, Commented Sep 10 at 21:23

strawdog · Accepted Answer · 2025-09-10 20:42:30Z

I am not sure how efficient my solution is for large dataframes, but basically you can compare two dataframes by corresponding values. Something like this:

df1 = pd.DataFrame({"ID":["001", "002", "003"],
                    "First Name":["John", "Roger", "Maggie"],
                    "Last Name":["Doe", "Smith", "Adams"]})
df2 = pd.DataFrame({"ID":["001", "A002", "A003"],
                    "First Name":["John1", "Roger", "Maggie"],
                    "Last Name":["Doe", "Smith", "Adams"]})

res = df1.ne(df2).stack()
diffs = res[res.eq(True)].index.tolist()

diffs:

[(0, 'First Name'), (1, 'ID'), (2, 'ID')]

The major issue with this colution is that when your dataframes have different shapes, you have to additionally find out which of them has extra elements.

pixel-process · Accepted Answer · 2025-09-10 22:42:11Z

Pandas compare and equals should do what you need.

import pandas as pd
df1 = pd.DataFrame({"ID":["001", "002", "003"],
                    "First Name":["John", "Roger", "Maggie"],
                    "Last Name":["Doe", "Smith", "Adams"]})
df2 = pd.DataFrame({"ID":["001", "A002", "A003"],
                    "First Name":["John1", "Roger", "Maggie"],
                    "Last Name":["Doe", "Smith", "Adams"]})

df1.equals(df2)
# returns False

df1.compare(df2)

	('ID', 'self')	('ID', 'other')	('First Name', 'self')	('First Name', 'other')
0	nan	nan	John	John1
1	2	A002	nan	nan
2	3	A003	nan	nan

Starting with equals will let you know if there are differences. Compare provides the details of where the differences are.

You can also target individual columns to get more concise output.

df1['ID'].compare(df2['ID'])

	self	other
1	002	A002
2	003	A003

Collectives™ on Stack Overflow

How to find columns not matching in Pandas Merge?

2 Answers 2

Comments

Comments

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Related