2

I have two different datasets a and b. I want to left join b to a but I want to join to a where only left join b['ColA'] and b['ColC'] to matching a['ColA'] and a['ColC']==1

something like expected_table = pd.merge(a,b, left_on=['ColA', ['ColC']==1 ] ,rigth_on = ['ColA',['ColC']==0])

a =  pd.DataFrame({"ColA":["num 1", "num 2", "num 3"],
                   "ColB":[5,6,7],
                   "ColC":[1,1,0]})

b =  pd.DataFrame({"ColA":["num 1", "num 2", "num 4"],
                   "Colx":[10,16,71],
                   "Coly":[0,0,0]})

Coly is all equal 0

expected= pd.DataFrame({"ColA":["num 1", "num 2", "num 3"],
                   "ColB":[5,6,7],
                   "ColC":[1,1,0], 
                   "Colx":[10,16,None]})```

I solve it by creating a new column on b table that matches same value with a['colx'].

But I wonder if there is a way to let you use conditions in merge/join process like in sql.

2
  • Any question with the answer below ? Commented Sep 30, 2021 at 18:43
  • 1
    oaky, we are actually not merging based on condition but adding query behind to slice it as we wish. It is a good asnwer thank you, since pandas dont have a supporting feature, this will work well to do the job. Commented Oct 1, 2021 at 15:46

1 Answer 1

2

There is no feature in Pandas to directly use conditions in merge/join process like in sql. Anyway, we can simulate this by chaining the Pandas .merge() function and perform the filtering by .query() which has syntax like sql where condition syntax.

To do this, you can do a left join on a and b on matching ColA and set indicator=True for us to distinguish whether the merged row entry is from a only or from merging both a and b.

Then, use .query() to filter on the required condition that if merging from both, ColC == 1 and Coly == 0. Otherwise, if only from a, we keep the row.

df_out = (pd.merge(a, b, left_on='ColA', right_on ='ColA', how='left', indicator=True)
            .query('(_merge == "left_only") | ((ColC == 1) & (Coly == 0))')
         )

Result:

print(df_out)


    ColA  ColB  ColC  Colx  Coly     _merge
0  num 1     5     1  10.0   0.0       both
1  num 2     6     1  16.0   0.0       both
2  num 3     7     0   NaN   NaN  left_only

Then, we can drop the unwanted columns by .drop, as follows:

df_out = df_out.drop(['Coly', '_merge'], axis=1)

Result:

print(df_out)

    ColA  ColB  ColC  Colx
0  num 1     5     1  10.0
1  num 2     6     1  16.0
2  num 3     7     0   NaN
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.