1

How can I replace values in a pandas dataframe with values from another dataframe based common columns.

I need to replace NaN values in dataframe1 based on the common columns of "types" and "o_period". any suggestion?

df1

types  c_years  o_periods  s_months  incidents
0       1        1          1     127.0        0.0
1       1        1          2      63.0        0.0
2       1        2          1    1095.0        3.0
3       1        2          2    1095.0        4.0
4       1        3          1    1512.0        6.0
5       1        3          2    3353.0       18.0
6       1        4          1       NaN        NaN
7       1        4          2    2244.0       11.0
8       2        1          1   44882.0       39.0
9       2        1          2   17176.0       29.0
10      2        2          1   28609.0       58.0
11      2        2          2   20370.0       53.0
12      2        3          1    7064.0       12.0
13      2        3          2   13099.0       44.0
14      2        4          1       NaN        NaN
15      2        4          2    7117.0       18.0
16      3        1          1    1179.0        1.0
17      3        1          2     552.0        1.0
18      3        2          1     781.0        0.0
19      3        2          2     676.0        1.0
20      3        3          1     783.0        6.0
21      3        3          2    1948.0        2.0
22      3        4          1       NaN        NaN
23      3        4          2     274.0        1.0
24      4        1          1     251.0        0.0
25      4        1          2     105.0        0.0
26      4        2          1     288.0        0.0
27      4        2          2     192.0        0.0
28      4        3          1     349.0        2.0
29      4        3          2    1208.0       11.0
30      4        4          1       NaN        NaN
31      4        4          2    2051.0        4.0
32      5        1          1      45.0        0.0
33      5        1          2       NaN        NaN
34      5        2          1     789.0        7.0
35      5        2          2     437.0        7.0
36      5        3          1    1157.0        5.0
37      5        3          2    2161.0       12.0
38      5        4          1       NaN        NaN
39      5        4          2     542.0        1.0

df2

 types  o_periods  s_months  incidents
0      1          1     911.0        3.0
1      1          2    1689.0        8.0
2      2          1   26852.0       36.0
3      2          2   14440.0       36.0
4      3          1     914.0        2.0
5      3          2     862.0        1.0
6      4          1     296.0        1.0
7      4          2     889.0        4.0
8      5          1     664.0        4.0
9      5          2    1047.0        7.0

df3:rows with NaN

    types  c_years  o_periods  s_months  incidents
6       1        4          1       NaN        NaN
14      2        4          1       NaN        NaN
22      3        4          1       NaN        NaN
30      4        4          1       NaN        NaN
33      5        1          2       NaN        NaN
38      5        4          1       NaN        NaN

I have tried to merge df2 with df3 but the indexing seems to reset.

2
  • Please refrain from posting images. Instead, copy paste your data (e.g. output of print(df) directly into your question. Also, please post your expected output along with what you have tried. Commented Aug 24, 2021 at 3:52
  • It would help if you pasted text values for the sample data in df1 and df2 instead of screenshots. The text values could be used to answer your question. Commented Aug 24, 2021 at 3:56

1 Answer 1

1

First separate the rows where you have NaN values out into a new dataframe called df3 and drop the rows where there are NaN values from df1.

Then do a left join based on the new dataframe.

df4 = pd.merge(df3,df2,how='left',on=['types','o_period'])

After that is done, append the rows from df4 back into df1.

Another way is to combine the 2 columns you want to lookup into a single column

df1["types_o"] = df1["types_o"].astype(str) + df1["o_period"].astype(str)

df2["types_o"] = df2["types_o"].astype(str) + df2["o_period"].astype(str)

Then you can do a look up on the missing values.

df1.types_o.replace('Nan', np.NaN, inplace=True)

df1.loc[df1['s_months'].isnull(),'s_months'] = df2['types_o'].map(df1.types_o)

df1.loc[df1['incidents'].isnull(),'incidents'] = df2['types_o'].map(df1.types_o)

You didn't paste any code or examples of your data which is easily reproducible so this is the best I can do.

Sign up to request clarification or add additional context in comments.

5 Comments

Hi, I extracted the rows with NaN values in a df called df3. However when I try to merge df3 and df2, the rows index resetted.
If doesn’t matter right? Because you’re merging on different columns
Does the order of the data matter to you? If it does you can create a hard coded index column
yup, I hard coded the index and appended the dataframe back. Thanks for your help though!! I used your method to left join the 2 dataframes
If my answer helped please mark it correct, thank you!

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.