1

I have 4 data frames as per below

df = pd.DataFrame({_id:[1,2,3,4], name:[Charan, Kumar, Nikhil, Kumar], })

df1 = pd.DataFrame({_id:[1,3,4], count_of_apple:[5,3,1]})


df2 = pd.DataFrame({_id:[1,2,3], count_of_organge:[8,4,6]})


df3 = pd.DataFrame({_id:[2,3,4], count_of_lime:[7,9,2]})

I want to merge all the data frames to a single data frame called a final

I have tried using PD.merge but the problem with it is I have to do it 3 different times is there a simpler way of doing it?

I used the below code to get the result

final = pd.merge(df, df1, on='_id', how='left')


final = pd.merge(final, df2, on='_id', how='left')


final = pd.merge(final, df3, on='_id', how='left')

I would want the final result to be something like this

final.head()

_id | name | count of orange | count of apple | count of lime

1 | Charan | 5 | 8 | Na

2 | Kumar | Na | 4 | 7

3 | Nikhil | 3 | 6 | 9

4 | Kumar | 1 | Na | 2

2 Answers 2

1

You can use concat, but first necessary convert _id to index for each DataFrame by DataFrame.set_index:

dfs = [df, df1, df2, df3]

df = pd.concat([x.set_index('_id') for x in dfs], axis=1).reset_index()

What is same like:

df = df.set_index('_id')
df1 = df1.set_index('_id')
df2 = df2.set_index('_id')
df3 = df3.set_index('_id')

df = pd.concat([df, df1, df2, df3], axis=1).reset_index()

print (df)
   _id    name  count_of_apple  count_of_organge  count_of_lime
0    1  Charan             5.0               8.0            NaN
1    2   Kumar             NaN               4.0            7.0
2    3  Nikhil             3.0               6.0            9.0
3    4   Kumar             1.0               NaN            2.0
Sign up to request clarification or add additional context in comments.

Comments

0

From Documentation https://pandas.pydata.org/pandas-docs/stable/user_guide/merging.html

In [1]: df1 = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'],
   ...:                     'B': ['B0', 'B1', 'B2', 'B3'],
   ...:                     'C': ['C0', 'C1', 'C2', 'C3'],
   ...:                     'D': ['D0', 'D1', 'D2', 'D3']},
   ...:                    index=[0, 1, 2, 3])
   ...:

In [8]: df4 = pd.DataFrame({'B': ['B2', 'B3', 'B6', 'B7'],
   ...:                     'D': ['D2', 'D3', 'D6', 'D7'],
   ...:                     'F': ['F2', 'F3', 'F6', 'F7']},
   ...:                    index=[2, 3, 6, 7])
   ...: 

In [9]: result = pd.concat([df1, df4], axis=1, sort=False)

Output: enter image description here

1 Comment

you need to provide index column so in every data frame you need to set index like df.set_index('_id'), then it will work.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.