Python how to join/merge Pandas dataframes with matching columns of specific values from different dataframes

Question

I have two different datasets a and b. I want to left join b to a but I want to join to a where only left join b['ColA'] and b['ColC'] to matching a['ColA'] and a['ColC']==1

something like expected_table = pd.merge(a,b, left_on=['ColA', ['ColC']==1 ] ,rigth_on = ['ColA',['ColC']==0])

a =  pd.DataFrame({"ColA":["num 1", "num 2", "num 3"],
                   "ColB":[5,6,7],
                   "ColC":[1,1,0]})

b =  pd.DataFrame({"ColA":["num 1", "num 2", "num 4"],
                   "Colx":[10,16,71],
                   "Coly":[0,0,0]})

Coly is all equal 0

expected= pd.DataFrame({"ColA":["num 1", "num 2", "num 3"],
                   "ColB":[5,6,7],
                   "ColC":[1,1,0], 
                   "Colx":[10,16,None]})```

I solve it by creating a new column on b table that matches same value with a['colx'].

But I wonder if there is a way to let you use conditions in merge/join process like in sql.

oaky, we are actually not merging based on condition but adding query behind to slice it as we wish. It is a good asnwer thank you, since pandas dont have a supporting feature, this will work well to do the job. — bohontw
– bohontw, Commented Oct 1, 2021 at 15:46

SeaBean · Accepted Answer · 2021-09-30 16:23:43Z

There is no feature in Pandas to directly use conditions in merge/join process like in sql. Anyway, we can simulate this by chaining the Pandas .merge() function and perform the filtering by .query() which has syntax like sql where condition syntax.

To do this, you can do a left join on a and b on matching ColA and set indicator=True for us to distinguish whether the merged row entry is from a only or from merging both a and b.

Then, use .query() to filter on the required condition that if merging from both, ColC == 1 and Coly == 0. Otherwise, if only from a, we keep the row.

df_out = (pd.merge(a, b, left_on='ColA', right_on ='ColA', how='left', indicator=True)
            .query('(_merge == "left_only") | ((ColC == 1) & (Coly == 0))')
         )

Result:

print(df_out)


    ColA  ColB  ColC  Colx  Coly     _merge
0  num 1     5     1  10.0   0.0       both
1  num 2     6     1  16.0   0.0       both
2  num 3     7     0   NaN   NaN  left_only

Then, we can drop the unwanted columns by .drop, as follows:

df_out = df_out.drop(['Coly', '_merge'], axis=1)

Result:

print(df_out)

    ColA  ColB  ColC  Colx
0  num 1     5     1  10.0
1  num 2     6     1  16.0
2  num 3     7     0   NaN

Collectives™ on Stack Overflow

Python how to join/merge Pandas dataframes with matching columns of specific values from different dataframes

1 Answer 1

Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Related