Transpose a text file to csv using Python

Question

I'm totally new to Python. I have a text file that is really huge. I wanted to do two things to it. 1. Extract a certain region from it, which I've been able to do. 2. Now transpose the extracted region and write to a csv file. This has turned out to be a little tricky. zip function didn't do what I want. Here's the data from print statement of step 1. I'd like to transpose this data.

Number  "A1"    "A2"    "A3"    "A4"

Data    "ABCD"  "ABCD"  "ABCD"  "ABCD"

Date    "Jan 04,2013"   "Jan 04,2013"   "Jan 04,2013"   "Jan 04,2013"

There's an empty line between each line. I need to transpose this data and save to a csv file (without splitting the date into two separate columns). I have many such files and the headers change for each. So pandas didn't work either.

import csv
import pandas as pd
colnames= ['Number','Data','Date']
fw=open("output.csv", "w")
f= open('input.txt', "rb")
fi = csv.writer(fw, delimiter=',',quoting=csv.QUOTE_ALL)
l = f.read()
ll= [x.split(',') for x in l.split('||')]
cols1 = ll[0]
cols2 = ll[1]
cols3 = ll[2]

final_cols = [cols1, cols2, cols3]
s= zip(*final_cols)
df = pd.DataFrame(s)
df.to_csv(fw, index=False, header=False)

@PadraicCunningham Using zip, the output looks something like this- [('N', 'u', 'm', 'b', 'e', 'r', — abn
– abn, Commented Nov 11, 2014 at 0:03
@PadraicCunningham That works fine for this particular file. But it may not work for the rest because the headers keep changing and here, I mentioned the headers. — abn
– abn, Commented Nov 11, 2014 at 0:09
@PadraicCunningham I tried doing text = data.split() for row in text: print(''.join(row)) print >> out, row but it has returned me an output like this - A1 A2 A3 A4 ABCD ABCD ABCD ABCD Jan 04 2013 Jan 04 2013 Jan 04 2013 Jan 04 2013 everything in the same column and by splitting date into three rows. — abn
– abn, Commented Nov 11, 2014 at 0:22

Padraic Cunningham · Accepted Answer · 2014-11-11 00:32:40Z

2

Using your data and re to remove the space in the date so splitting keeps the date together:

import re
with open("in.txt") as f:
    lines = [re.sub('\s(?=\d\d,)',",",x).split() for x in f if x.strip()]
    print(zip(*lines))
[('Number', 'Data', 'Date'), ('A1', 'ABCD', 'Jan,04,2013'), ('A2', 'ABCD', 'Jan,04,2013'), ('A3', 'ABCD', 'Jan,04,2013'), ('A4', 'ABCD', 'Jan,04,2013')]

Writing is trivial:

import re
import csv
with open("in.txt") as f:
    lines = [re.sub('\s(?=\d\d,)',",",x).split() for x in f if x.strip()]
    zipped = zip(*lines)
    with open("out.csv","w") as f1:
        wr = csv.writer(f1)
        wr.writerows(zipped)

edited Nov 11, 2014 at 0:32

answered Nov 11, 2014 at 0:07

Padraic Cunningham

181k30 gold badges264 silver badges327 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Beginner Over a year ago

This answer made me go back and look up regex syntax. Thanks

ojy · Accepted Answer · 2014-11-11 01:14:48Z

You can still use pandas.

import pandas as pd
data = pd.read_csv("input.txt", delim_whitespace=True , header = None, index_col = 0)
data = data.dropna()
data = data.transpose()
data.to_csv("output.csv", index = False)

In the above code, data.dropna() allows to remove empty lines, and data.transpose() lets you transpose your dataframe.

The output looks like this:

Number,Data,Date
A1,ABCD,"Jan 04,2013"
A2,ABCD,"Jan 04,2013"
A3,ABCD,"Jan 04,2013"
A4,ABCD,"Jan 04,2013"

tdelaney · Accepted Answer · 2014-11-11 00:46:41Z

You have a couple of problems, starting with your attempts to split the file with '||' and '"', when those aren't your separators. You can build a table line-by-line and then transpose + write into the csv file.

(edit) I Didn't account for spaces inside quotes. Updated to honor quotes and to use ';' as a delimiter since your dates include commas. I used a regex to find words without spaces or words in quotes, then removed the quotes.

import csv
import re

find_cells_re = re.compile(r'\w+|"[^"]*"')

with open('input.txt', "r") as f:
    # extract rows, filtering out empty lines
    table = [row for row in 
        (cell.strip('"') for cell in 
        (find_cells_re.findall(line) for line in f))
        if row]
with open("output.csv", "w") as fw:
    writer = csv.writer(rw)
    for row in zip(*table):
        writer.writerow(row)

this is still going to split the date which is the biggest issue
@PadraicCunningham - you're right. This turned out to be a bit more complicated.

J0e3gan · Accepted Answer · 2014-11-11 08:19:50Z

0

Set delimiter=',' for changing to CSV.

edited Nov 11, 2014 at 8:19

J0e3gan

8,96810 gold badges57 silver badges81 bronze badges

answered Nov 11, 2014 at 7:30

Prakash K

234 bronze badges

1 Comment

Johannes S. Over a year ago

Please elaborate and make your answer more detailed. As it is now, it does not provide real value.

Collectives™ on Stack Overflow

Transpose a text file to csv using Python

4 Answers 4

1 Comment

Comments

2 Comments

1 Comment

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

1 Comment

Comments

2 Comments

1 Comment

Related