1

I have several csv files that has same first row element in it. For example:

csv-1.csv:
Value,0
Currency,0
datetime,0
Receiver,0
Beneficiary,0
Flag,0
idx,0

csv-2.csv:
Value,0
Currency,1
datetime,0
Receiver,0
Beneficiary,0
Flag,0
idx,0

And with these files (more than 2 files by the way) I want to merge them and create something like that:

left csv-1 csv-2
Value 0 0
Currency 0 1
datetime 0 0

How can I create this funtion in python?

2

3 Answers 3

2

First, you must create indexes in dataframes by first columns, on which you will further join:

import pandas as pd
import numpy as np

df1 = pd.read_csv('csv-1.csv')
df2 = pd.read_csv('csv-2.csv')

df1 = df1.set_index('col1')
df2 = df2.set_index('col1')

df = df1.join(df2, how='outer')

Then rename the column names if needed, or make a new index

Sign up to request clarification or add additional context in comments.

Comments

1

You can use:

import pandas as pd
import pathlib

out = (pd.concat([pd.read_csv(csvfile, header=None, index_col=[0], names=[csvfile.stem])
           for csvfile in sorted(pathlib.Path.cwd().glob('*.csv'))], axis=1)
         .rename_axis('left').reset_index())

Output:

>>> out
          left  csv-1  csv-2
0        Value      0      0
1     Currency      0      1
2     datetime      0      0
3     Receiver      0      0
4  Beneficiary      0      0
5         Flag      0      0
6          idx      0      0

Comments

1

Here's what you can do

import pandas as pd
from glob import glob

def refineFilename(path):
    return path.split(".")[0]

df=pd.DataFrame()

for file in glob("csv-*.csv"):
    new=pd.read_csv(file,header=None,index_col=[0])
    df[refineFinename(file)]=new[1]

df.reset_index(inplace=True)
df.rename(columns={0:"left"},inplace=True)

print(df)

"""
          left  csv-1  csv-2
0        Value      0      0
1     Currency      0      1
2     datetime      0      0
3     Receiver      0      0
4  Beneficiary      0      0
5         Flag      0      0
6          idx      0      0
"""

What we are doing here is making the df variable store all data, and then iterating through all files and adding a second column of those files to df with file name as the column name. 

1 Comment

Looking like just the longer version of @Corralien's code :P

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.