3

Say I have two DataFrames

df1 = pd.DataFrame({'A':[1,2], 'B':[3,4]}, index = [0,1])
df2 = pd.DataFrame({'B':[8,9], 'C':[10,11]}, index = [1,2])

I want to merge so that any values in df1 are overwritten in there is a value in df2 at that location and any new values in df2 are added including the new rows and columns.

The result should be:

   A  B  C
0  1  3  nan
1  2  8  10
2 nan 9  11

I've tried combine_first but that causes only nan values to be overwritten updated has the issue where new rows are created rather than overwritten merge has many issues.

I've tried writing my own function

def take_right(df1, df2, j, i):
    print (df1)
    print (df2)
    try:
        s1 = df1[j][i]
    except:
        s1 = np.NaN
    try:
        s2 = df2[j][i]
    except:
        s2 = np.NaN
    
    if math.isnan(s2):
        #print(s1)
        return s1
    else:
       # print(s2)
        return s2
    
def combine_df(df1, df2):
    
    rows = (set(df1.index.values.tolist()) | set(df2.index.values.tolist()))
    #print(rows)
    columns = (set(df1.columns.values.tolist()) | set(df2.columns.values.tolist()))
    #print(columns)
    df = pd.DataFrame()
    #df.columns = columns
    for i in rows:
        #df[:][i]=[]
        for j in columns:
                
                df = df.insert(int(i), j, take_right(df1,df2,j,i), allow_duplicates=False)
   # print(df)
                
    return df

This won't add new columns or rows to an empty DataFrame.

Thank you!!

1
  • Are you sure combine_first doesn't work, and you weren't just doing it in the wrong order? Commented Jun 16, 2022 at 0:19

2 Answers 2

3

One approach is to create an empty output dataframe with the union of columns and indices from df1 and df2 and then use the df.update method to assign their values into the out_df

import pandas as pd

df1 = pd.DataFrame({'A':[1,2], 'B':[3,4]}, index = [0,1])
df2 = pd.DataFrame({'B':[8,9], 'C':[10,11]}, index = [1,2])


out_df = pd.DataFrame(
    columns = df1.columns.union(df2.columns),
    index = df1.index.union(df2.index),
)
out_df.update(df1)
out_df.update(df2)
out_df

enter image description here

Sign up to request clarification or add additional context in comments.

Comments

0

Why does combine_first not work?

df = df2.combine_first(df1)
print(df)

Output:

     A  B     C
0  1.0  3   NaN
1  2.0  8  10.0
2  NaN  9  11.0

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.