0

I have 13 csv files to merge. I wanted to try pandas and python but I am struggling.

There is 3 types of files the key is a 1) has columns a b c d 2) has columns a b c d (with a not containing any from 1) 3) has columns a b c d e f g (with a containing all from 1 and 2)

How could i go about merging these all into one csv containing all the info from all the files?

2
  • does pd.concat help at all? Commented Sep 9, 2015 at 21:33
  • I was trying that earlier but then i discovered it was not doing what I was after as it wouldn't check if the unique column was already in their or not Commented Sep 9, 2015 at 21:34

2 Answers 2

2

You should do an outer merge as follows, making use of the built-in reduce method:

files = ['file1.csv', 'file2.csv', ...] # the 13 files
dataframes = [ pandas.read_csv( f ) for f in files ] # add arguments as necessary to the read_csv method
merged = reduce(lambda left,right: pandas.merge(left,right,on='a', how='outer'), dataframes)
Sign up to request clarification or add additional context in comments.

3 Comments

And to return this as a CSV I would simply do merged.to_csv('merged.csv')
You can try that and load it with excel or something, to see how it looks. There are several ways to go about this. You can also try posting some sample input and your desired output (can be very basic)
The input is almost random due to the sheer amount, but it is like serial numbers etc etc so it contains alphanumerical and quotation characters so what dtype and how would I have to set?
0

Hard to write it exactly without seeing example data. But this should get you started.

import pandas as pd
df = pd.read_csv('file1.csv')
df = df.append(pd.read_csv('file2.csv'))  #this one adds more rows to the dataframe
df = df.join(pd.read_csv('file3.csv'), on=[a,b,c,d], how='left') # this one will add columns if they match data

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.