1

How can I merge more than 2 files that e.g. look like these

first csv file:

email,joe,@gmail.com
email,doe,@hotmail.com
name,emilly,doe
name,jenny,van
year,talia,19
year,kevin,20

second csv file:

email,joe,mr
email,doe,mrs
name,jenny,gogh
year,talia,97

I would like to merge these files to look like this:

email,joe,@gmail.com,mr
email,doe,@hotmail.com,mrs
name,emilly,doe,nan
name,jenny,van,gogh
year,talia,19,97
year,kevin,20,nan

any help would be appreciated

2 Answers 2

3

Use DataFrame.merge with left or default inner join:

#convert files to DataFrames, if no header added header=None
df1 = pd.read_csv(file1, header=None)
df2 = pd.read_csv(file2, header=None)

#left join by first 2 columns
df = df1.merge(df2, on=[0,1], how='left')
print (df)
       0       1           2_x   2_y
0  email     joe    @gmail.com    mr
1  email     doe  @hotmail.com   mrs
2   name  emilly           doe   NaN
3   name   jenny           van  gogh
4   year   talia            19    97
5   year   kevin            20   NaN

If need values skipped:

#inner join by first 2 columns
df = df1.merge(df2, on=[0,1])
print (df)
       0      1           2_x   2_y
0  email    joe    @gmail.com    mr
1  email    doe  @hotmail.com   mrs
2   name  jenny           van  gogh
3   year  talia            19    97

#write to file
df.to_csv(file3, index=False, header=False)
Sign up to request clarification or add additional context in comments.

4 Comments

The lines that don't exist on the second file would be skipped. E.g. the line name,emilly,doe,nan wouldn't be written on the merged file
@iamgroot - Sorry, I am confused, output is different like in question what need? Also why is merged year,talia,19 with gender,talia,97 ?
Oh sorry, my fault. Please have a look on my question again, i have edited the code.
@iamgroot - added to my answer.
1

Update:

pd.merge(df1, df2, on=[0, 1], how='outer') \
  .to_csv('output.csv', index=False, header=False, na_rep='nan')

# Content of file:
email,joe,@gmail.com,mr
email,doe,@hotmail.com,mrs
name,emilly,doe,nan
name,jenny,van,gogh
year,talia,19,97
year,kevin,20,nan

Update

How to merge more than 2 csv files? Can I use merge() for 3 csv files too?

I split your second file into 2 parts:

# data1.csv
email,joe,@gmail.com
email,doe,@hotmail.com
name,emilly,doe
name,jenny,van
year,talia,19
year,kevin,20

# data2.csv
email,joe,mr
email,doe,mrs

# data3.csv
name,jenny,gogh
year,talia,97

Use reduce from functools module:

filenames = ['data1.csv', 'data2.csv', 'data3.csv']
dfs = [pd.read_csv(fn, header=None ) for fn in filenames]
df = reduce(lambda df1, df2: pd.merge(df1, df2, on=[0, 1], how='outer'), dfs)
df.to_csv('output.csv', index=False, header=False, na_rep='nan')

Output:

email,joe,@gmail.com,mr,nan
email,doe,@hotmail.com,mrs,nan
name,emilly,doe,nan,nan
name,jenny,van,nan,gogh
year,talia,19,nan,97
year,kevin,20,nan,nan

2 Comments

How to merge more than 2 csv files? Can I use merge() for 3 csv files too?
I updated my answer. Can you check it please? Look carefully the output.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.