Merging df in python

Question

Say I have two DataFrames

df1 = pd.DataFrame({'A':[1,2], 'B':[3,4]}, index = [0,1])
df2 = pd.DataFrame({'B':[8,9], 'C':[10,11]}, index = [1,2])

I want to merge so that any values in df1 are overwritten in there is a value in df2 at that location and any new values in df2 are added including the new rows and columns.

The result should be:

   A  B  C
0  1  3  nan
1  2  8  10
2 nan 9  11

I've tried combine_first but that causes only nan values to be overwritten updated has the issue where new rows are created rather than overwritten merge has many issues.

I've tried writing my own function

def take_right(df1, df2, j, i):
    print (df1)
    print (df2)
    try:
        s1 = df1[j][i]
    except:
        s1 = np.NaN
    try:
        s2 = df2[j][i]
    except:
        s2 = np.NaN
    
    if math.isnan(s2):
        #print(s1)
        return s1
    else:
       # print(s2)
        return s2
    
def combine_df(df1, df2):
    
    rows = (set(df1.index.values.tolist()) | set(df2.index.values.tolist()))
    #print(rows)
    columns = (set(df1.columns.values.tolist()) | set(df2.columns.values.tolist()))
    #print(columns)
    df = pd.DataFrame()
    #df.columns = columns
    for i in rows:
        #df[:][i]=[]
        for j in columns:
                
                df = df.insert(int(i), j, take_right(df1,df2,j,i), allow_duplicates=False)
   # print(df)
                
    return df

This won't add new columns or rows to an empty DataFrame.

Thank you!!

Are you sure combine_first doesn't work, and you weren't just doing it in the wrong order? — BeRT2me
– BeRT2me, Commented Jun 16, 2022 at 0:19

mitoRibo · Accepted Answer · 2022-06-15 22:34:38Z

3

One approach is to create an empty output dataframe with the union of columns and indices from df1 and df2 and then use the df.update method to assign their values into the out_df

import pandas as pd

df1 = pd.DataFrame({'A':[1,2], 'B':[3,4]}, index = [0,1])
df2 = pd.DataFrame({'B':[8,9], 'C':[10,11]}, index = [1,2])


out_df = pd.DataFrame(
    columns = df1.columns.union(df2.columns),
    index = df1.index.union(df2.index),
)
out_df.update(df1)
out_df.update(df2)
out_df

answered Jun 15, 2022 at 22:34

mitoRibo

4,5981 gold badge16 silver badges24 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

BeRT2me · Accepted Answer · 2022-06-16 00:16:21Z

0

Why does combine_first not work?

df = df2.combine_first(df1)
print(df)

Output:

     A  B     C
0  1.0  3   NaN
1  2.0  8  10.0
2  NaN  9  11.0

answered Jun 16, 2022 at 0:16

BeRT2me

13.3k2 gold badges17 silver badges39 bronze badges

Collectives™ on Stack Overflow

Merging df in python

2 Answers 2

Comments

Comments

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Related