Merge 'left', but override 'right' values where possible

Question

Objective

I've reviewed pandas documentation on merge but have a question on overriding values efficiently in a 'left' merge. I can do this simply for one pair of values (as seen here), but it becomes cluttered when trying to do multiple pairs.

Setup

If I take the following dataframes:

a = pd.DataFrame({
   'id': [0,1,2,3,4,5,6,7,8,9],
    'val': [100,100,100,100,100,100,100,100,100,100]
})

b = pd.DataFrame({
    'id':[0,2,7],
    'val': [500, 500, 500]
})

I can merge them:

df = a.merge(b, on=['id'], how='left', suffixes=('','_y'))

to get

   id  val  val_y
0   0  100  500.0
1   1  100    NaN
2   2  100  500.0
3   3  100    NaN
4   4  100    NaN
5   5  100    NaN
6   6  100    NaN
7   7  100  500.0
8   8  100    NaN
9   9  100    NaN

I want to keep left values where no right value exists, but where possible overwrite with the right values.

My desired outcome is:

   id    val
0   0  500.0
1   1  100.0
2   2  500.0
3   3  100.0
4   4  100.0
5   5  100.0
6   6  100.0
7   7  500.0
8   8  100.0
9   9  100.0

My Attempt

I know I can accomplish this with a few lines of code:

df.loc[df.val_y.notnull(), 'val'] = df[df.val_y.notnull()].val_y
df = df.drop(['val_y'], axis = 1)

Or I can use the logic from this question.

But this becomes cluttered when there are multiple column pairings where I want to apply this logic.

For example, using a and b below:

a = pd.DataFrame({
   'id': [0,1,2,3,4,5,6,7,8,9],
    'val': [100,100,100,100,100,100,100,100,100,100],
    'val_2':[200, 200, 200, 200, 200, 200, 200, 200, 200, 200]
})
b = pd.DataFrame({
    'id':[0,2,7],
    'val': [500, 500, 500],
    'val_2': [500,500,500]
})

Is there a quicker, cleaner way to get my desired outcome?

Possible duplicate of Python, pandas: replacing values in one DF by same-index values from another DF — G. Anderson
– G. Anderson, Commented May 3, 2019 at 21:03
Not a duplicate -- this would work similar to my attempt, but would need to be applied for every pair of val columns — Cilantro Ditrek
– Cilantro Ditrek, Commented May 3, 2019 at 21:04

cs95 · Accepted Answer · 2019-05-03 21:04:18Z

I'd do this using set_index and update:

u = a.set_index('id')
u.update(b.set_index('id'))  # Update a's values with b's values

u.reset_index()

   id    val
0   0  500.0
1   1  100.0
2   2  500.0
3   3  100.0
4   4  100.0
5   5  100.0
6   6  100.0
7   7  500.0
8   8  100.0
9   9  100.0

The update is aligned on the index. For this reason, I set "id" to be the index in both DataFrames before performing the update step.

Note that the "id" column must be unique.

Another option is using concat and drop_duplicates:

pd.concat([b, a]).drop_duplicates('id').sort_values('id')

   id  val
0   0  500
1   1  100
1   2  500
3   3  100
4   4  100
5   5  100
6   6  100
2   7  500
8   8  100
9   9  100

Since b overrides a, b must come first in the concat step.

(-; a.assign(val=[dict(b.values).get(i, dict(a.values)[i]) for i in a.id])

BENY · Accepted Answer · 2019-05-03 21:35:47Z

3

numpy searchsorted and assign

a.iloc[np.searchsorted(a.id,b.id),1]=b.val.values
a
Out[1382]: 
   id  val
0   0  500
1   1  100
2   2  500
3   3  100
4   4  100
5   5  100
6   6  100
7   7  500
8   8  100
9   9  100

answered May 3, 2019 at 21:35

BENY

324k22 gold badges176 silver badges250 bronze badges

4 Comments

piRSquared Over a year ago

This is the actualized code that my brain wanted to think of but did not (-:

BENY Over a year ago

@piRSquared haha , I think hard to find something different

piRSquared Over a year ago

Same idea but with loc... a.loc[a.index[a.id.searchsorted(b.id)], 'val'] = [*b.val]

Cilantro Ditrek Over a year ago

How would this be done when id is actually a combination of multiple columns?

piRSquared · Accepted Answer · 2019-05-03 21:32:29Z

2

Goofing Off with `dict`

d = dict(a.values)
d.update(dict(b.values))
pd.DataFrame(dict(zip(a, zip(*d.items()))))

   id  val
0   0  500
1   1  100
2   2  500
3   3  100
4   4  100
5   5  100
6   6  100
7   7  500
8   8  100
9   9  100

answered May 3, 2019 at 21:32

piRSquared

296k68 gold badges509 silver badges654 bronze badges

Comments

ciaran haines · Accepted Answer · 2025-06-04 12:17:15Z

Perhaps the pandas way is now pandas.combine_first() - "Combine two DataFrame objects by filling null values in one DataFrame with non-null values from other DataFrame. The row and column indexes of the resulting DataFrame will be the union of the two". This is relatively easy if you have a clear 1:1 assignment column to use as an index. You must start with the superceding dataframe.

a = pd.DataFrame({
   'id': [0,1,2,3,4,5,6,7,8,9],
    'val': [100,100,100,100,100,100,100,100,100,100]
    })

b = pd.DataFrame({
    'id':[0,2,7],
    'val': [500, 500, 500]
    })

output = b.set_index('id').combine_first(a.set_index('id')).reset_index()
output
>,id,val
0,0,500
1,1,100
2,2,500
3,3,100
4,4,100
5,5,100
6,6,100
7,7,500
8,8,100
9,9,100

And works with multiple columns:

c = pd.DataFrame({
   'id': [0,1,2,3,4,5,6,7,8,9],
    'val': [100,100,100,100,100,100,100,100,100,100],
    'val_2':[200, 200, 200, 200, 200, 200, 200, 200, 200, 200]
    })
d = pd.DataFrame({
    'id':[0,2,7],
    'val': [500, 500, 500],
    'val_2': [500,500,500]
    })
output = d.set_index('id').combine_first(c.set_index('id')).reset_index()
output
> ,id,val,val_2
0,0,500,500
1,1,100,200
2,2,500,500
3,3,100,200
4,4,100,200
5,5,100,200
6,6,100,200
7,7,500,500
8,8,100,200
9,9,100,200

G. Anderson · Accepted Answer · 2019-05-03 21:09:46Z

One other option is to do the merge as you are already doing it, then fill the NaN values to the right

df

    id  val val_y
0   0   100 500.0
1   1   100 NaN
2   2   100 500.0
3   3   100 NaN
4   4   100 NaN
5   5   100 NaN
6   6   100 NaN
7   7   100 500.0
8   8   100 NaN
9   9   100 NaN

df.fillna(method='ffill', axis=1)

    id  val val_y
0   0.0 100.0   500.0
1   1.0 100.0   100.0
2   2.0 100.0   500.0
3   3.0 100.0   100.0
4   4.0 100.0   100.0
5   5.0 100.0   100.0
6   6.0 100.0   100.0
7   7.0 100.0   500.0
8   8.0 100.0   100.0
9   9.0 100.0   100.0

Then slice just the last column with iloc[:,-1]

Collectives™ on Stack Overflow

Merge 'left', but override 'right' values where possible

Objective

Setup

My Attempt

5 Answers 5

1 Comment

4 Comments

Goofing Off with `dict`

Comments

Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

Objective

Setup

My Attempt

5 Answers 5

1 Comment

4 Comments

Goofing Off with dict

Comments

Comments

Comments

Linked

Related

Goofing Off with `dict`