Say I have two DataFrames
df1 = pd.DataFrame({'A':[1,2], 'B':[3,4]}, index = [0,1])
df2 = pd.DataFrame({'B':[8,9], 'C':[10,11]}, index = [1,2])
I want to merge so that any values in df1 are overwritten in there is a value in df2 at that location and any new values in df2 are added including the new rows and columns.
The result should be:
A B C
0 1 3 nan
1 2 8 10
2 nan 9 11
I've tried combine_first but that causes only nan values to be overwritten
updated has the issue where new rows are created rather than overwritten
merge has many issues.
I've tried writing my own function
def take_right(df1, df2, j, i):
print (df1)
print (df2)
try:
s1 = df1[j][i]
except:
s1 = np.NaN
try:
s2 = df2[j][i]
except:
s2 = np.NaN
if math.isnan(s2):
#print(s1)
return s1
else:
# print(s2)
return s2
def combine_df(df1, df2):
rows = (set(df1.index.values.tolist()) | set(df2.index.values.tolist()))
#print(rows)
columns = (set(df1.columns.values.tolist()) | set(df2.columns.values.tolist()))
#print(columns)
df = pd.DataFrame()
#df.columns = columns
for i in rows:
#df[:][i]=[]
for j in columns:
df = df.insert(int(i), j, take_right(df1,df2,j,i), allow_duplicates=False)
# print(df)
return df
This won't add new columns or rows to an empty DataFrame.
Thank you!!

combine_firstdoesn't work, and you weren't just doing it in the wrong order?