Python pandas conditional replace string based on column values

Question

Given these data frames...:

DF = pd.DataFrame({'COL1': ['A', 'B', 'C', 'D','D','D'], 
                   'COL2': [11032, 1960, 11400, 11355, 8, 7], 
                   'year': ['2016', '2017', '2018', '2019', '2020', '2021']})
DF

   COL1 COL2    year
0   A   11032   2016
1   B   1960    2017
2   C   11400   2018
3   D   11355   2019
4   D   8       2020
5   D   7       2021

DF2 = pd.DataFrame({'ColX': ['D'], 'ColY':['2021'], 'ColZ':[100]
DF2
        ColX   ColY    ColZ
   0     D      2021   100

If the following conditions are met:

COL1 = ColX from DF2

year = ColY from DF2

Then change the value in COL2 to ColZ from DF2.

What if there were multiple ColZ values for the same matching pairs of ColX and ColY? — Alexander
– Alexander, Commented Oct 9, 2015 at 3:50
DF2['ColY'] should be ['2021'] correct? It says 2012, but 2021 in the output. — Alexander
– Alexander, Commented Oct 9, 2015 at 4:14

Alexander · Accepted Answer · 2015-10-09 04:32:58Z

This looks like you want to update DF with data from DF2.

Assuming that all values in DF2 are unique for a given pair of values in ColX and ColY:

DF = DF.merge(DF2.set_index(['ColX', 'ColY'])[['ColZ']], 
              how='left', 
              left_on=['COL1', 'year'], 
              right_index=True)
DF.COL2.update(DF.ColZ)
del DF['ColZ']

>>> DF
  COL1   COL2  year
0    A  11032  2016
1    B   1960  2017
2    C  11400  2018
3    D  11355  2019
4    D      8  2020
5    D    100  2021

I merge a temporary dataframe (DF2.set_index(['ColX', 'ColY'])[['ColZ']]) into DF, which adds all the values from ColZ where its index (ColX and ColY) match the values from COL1 and year in DF. All non-matching values are filled with NA.

I then use update to overwrite the values in DF.COL2 from the non-null values in DF.ColZ.

I then delete DF['ColZ'] to clean-up.

If ColZ matches an existing column name in DF, then you would need to make some adjustments.

An alternative solution is as follows:

DF = DF.set_index(['COL1', 'year']).update(DF2.set_index(['ColX', 'ColY']))
DF.reset_index(inplace=True)

The output is identical to that above.

As the song goes:Thank you...thank you...thank God for you the wind beneath my wings...
One more thing (hopefully): What if I wanted to add the condition: if less than all conditions (2) are met (found), replace the current value with 'n/a'?
With the first method above, I believe DF.ColZ will give you what you want (i.e. don't delete it). It is all matching values from DF2 given your two conditions, with n/a for unmatched values.

Collectives™ on Stack Overflow

Python pandas conditional replace string based on column values

1 Answer 1

3 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Linked

Related