Python: merging multiple text files

Question

I am new to Python, and not much of a coder. I have 40+ text files that I want to combine together (in a 'wide' csv, as opposed to a 'tall' csv. That is, I don't want to append the files) and produce a new csv.

Using Pandas (merge) I am able to achieve what I want, but I presume there is a simpler way. Here it is on seven of the files:

import pandas as pd

a = pd.read_csv("c:/pyTest/B01001.txt")
b = pd.read_csv("c:/pyTest/B01002.txt")
c = pd.read_csv("c:/pyTest/B01003.txt")
d = pd.read_csv("c:/pyTest/B02001.txt")
e = pd.read_csv("c:/pyTest/B05001.txt")
f = pd.read_csv("c:/pyTest/B05002.txt")
g = pd.read_csv("c:/pyTest/B05012.txt")

merged = a.merge(b.merge(c.merge(d.merge(e.merge(f.merge(g, on='GEOID'), on='GEOID'), on='GEOID'), on='GEOID'), on='GEOID'), on='GEOID')
merged.to_csv("c:/pytest/fook.csv", index=False)

It would be great if the duplicated column names (eg 'GEOID') weren't repeated in the output file too.

Any help from you experts greatly appreciated.

Can you show me an example of how two of the files look (just a single row) and how you would like them to end out? I dont follow your 'tall'/'wide' anology. — brunsgaard
– brunsgaard, Commented Oct 7, 2014 at 19:15
I think this is very similar to what you want to do? stackoverflow.com/questions/18689453/… — Vince
– Vince, Commented Oct 7, 2014 at 19:20

unutbu · Accepted Answer · 2014-10-07 19:26:48Z

You can apply merge to a list of DataFrames using reduce:

import pandas as pd
import functools

files = ["c:/pyTest/B01001.txt", "c:/pyTest/B01002.txt", "c:/pyTest/B01003.txt",
         "c:/pyTest/B02001.txt", "c:/pyTest/B05001.txt", "c:/pyTest/B05002.txt",
         "c:/pyTest/B05012.txt",]
dfs = [pd.read_csv(filename).set_index('GEOID') for filename in files]
mergefunc = functools.partial(pd.merge, left_index=True, right_index=True)
merged = functools.reduce(mergefunc, dfs)

merged.to_csv("c:/pytest/fook.csv", index=False)

When Pandas merges two DataFrames based on the index (rather than on columns), the resultant DataFrame uses the merged index. Thus you can avoid duplication of GEOID columns by merging on the index.

For example:

In [99]: import numpy as np
In [100]: import pandas as pd
In [101]: import functools

In [102]: dfs = [pd.DataFrame(np.arange(6).reshape(3,2), columns=['A','B{}'.format(i)]).set_index('A') for i in range(3)]

In [103]: mergefunc = functools.partial(pd.merge, left_index=True, right_index=True)    
In [104]: merged = functools.reduce(mergefunc, dfs)

In [105]: merged
Out[105]: 
   B0  B1  B2
A            
0   1   1   1
2   3   3   3
4   5   5   5

Collectives™ on Stack Overflow

Python: merging multiple text files

1 Answer 1

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Linked

Related