How to merge multiple csv files?

Question

I have several csv files that has same first row element in it. For example:

csv-1.csv:
Value,0
Currency,0
datetime,0
Receiver,0
Beneficiary,0
Flag,0
idx,0

csv-2.csv:
Value,0
Currency,1
datetime,0
Receiver,0
Beneficiary,0
Flag,0
idx,0

And with these files (more than 2 files by the way) I want to merge them and create something like that:

left	csv-1	csv-2
Value	0	0
Currency	0	1
datetime	0	0

How can I create this funtion in python?

Isn't it better to have first column as header and second column from each file as rows? Anyway, what have you tried so far - show your code (minimal reproducible example) — buran
– buran, Commented Feb 13, 2023 at 16:26
Does this answer your question? Merging two pandas dataframes results in "duplicate" columns — imxitiz
– imxitiz, Commented Feb 13, 2023 at 16:41

angwrk · Accepted Answer · 2023-02-13 16:31:06Z

2

First, you must create indexes in dataframes by first columns, on which you will further join:

import pandas as pd
import numpy as np

df1 = pd.read_csv('csv-1.csv')
df2 = pd.read_csv('csv-2.csv')

df1 = df1.set_index('col1')
df2 = df2.set_index('col1')

df = df1.join(df2, how='outer')

Then rename the column names if needed, or make a new index

answered Feb 13, 2023 at 16:31

angwrk

4203 silver badges8 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Corralien · Accepted Answer · 2023-02-13 16:44:22Z

You can use:

import pandas as pd
import pathlib

out = (pd.concat([pd.read_csv(csvfile, header=None, index_col=[0], names=[csvfile.stem])
           for csvfile in sorted(pathlib.Path.cwd().glob('*.csv'))], axis=1)
         .rename_axis('left').reset_index())

Output:

>>> out
          left  csv-1  csv-2
0        Value      0      0
1     Currency      0      1
2     datetime      0      0
3     Receiver      0      0
4  Beneficiary      0      0
5         Flag      0      0
6          idx      0      0

imxitiz · Accepted Answer · 2023-02-13 18:23:46Z

Here's what you can do

import pandas as pd
from glob import glob

def refineFilename(path):
    return path.split(".")[0]

df=pd.DataFrame()

for file in glob("csv-*.csv"):
    new=pd.read_csv(file,header=None,index_col=[0])
    df[refineFinename(file)]=new[1]

df.reset_index(inplace=True)
df.rename(columns={0:"left"},inplace=True)

print(df)

"""
          left  csv-1  csv-2
0        Value      0      0
1     Currency      0      1
2     datetime      0      0
3     Receiver      0      0
4  Beneficiary      0      0
5         Flag      0      0
6          idx      0      0
"""

What we are doing here is making the df variable store all data, and then iterating through all files and adding a second column of those files to df with file name as the column name.

Looking like just the longer version of @Corralien's code :P

Collectives™ on Stack Overflow

How to merge multiple csv files?

3 Answers 3

Comments

Comments

1 Comment

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

1 Comment

Linked

Related