1

I'm totally new to Python. I have a text file that is really huge. I wanted to do two things to it. 1. Extract a certain region from it, which I've been able to do. 2. Now transpose the extracted region and write to a csv file. This has turned out to be a little tricky. zip function didn't do what I want. Here's the data from print statement of step 1. I'd like to transpose this data.

Number  "A1"    "A2"    "A3"    "A4"

Data    "ABCD"  "ABCD"  "ABCD"  "ABCD"

Date    "Jan 04,2013"   "Jan 04,2013"   "Jan 04,2013"   "Jan 04,2013"

There's an empty line between each line. I need to transpose this data and save to a csv file (without splitting the date into two separate columns). I have many such files and the headers change for each. So pandas didn't work either.

import csv
import pandas as pd
colnames= ['Number','Data','Date']
fw=open("output.csv", "w")
f= open('input.txt', "rb")
fi = csv.writer(fw, delimiter=',',quoting=csv.QUOTE_ALL)
l = f.read()
ll= [x.split(',') for x in l.split('||')]
cols1 = ll[0]
cols2 = ll[1]
cols3 = ll[2]

final_cols = [cols1, cols2, cols3]
s= zip(*final_cols)
df = pd.DataFrame(s)
df.to_csv(fw, index=False, header=False)
6
  • 1
    what exactly is not working? Commented Nov 11, 2014 at 0:00
  • @PadraicCunningham Using zip, the output looks something like this- [('N', 'u', 'm', 'b', 'e', 'r', Commented Nov 11, 2014 at 0:03
  • @PadraicCunningham That works fine for this particular file. But it may not work for the rest because the headers keep changing and here, I mentioned the headers. Commented Nov 11, 2014 at 0:09
  • just split into lists and then transpose with zip Commented Nov 11, 2014 at 0:10
  • @PadraicCunningham I tried doing text = data.split() for row in text: print(''.join(row)) print >> out, row but it has returned me an output like this - A1 A2 A3 A4 ABCD ABCD ABCD ABCD Jan 04 2013 Jan 04 2013 Jan 04 2013 Jan 04 2013 everything in the same column and by splitting date into three rows. Commented Nov 11, 2014 at 0:22

4 Answers 4

2

Using your data and re to remove the space in the date so splitting keeps the date together:

import re
with open("in.txt") as f:
    lines = [re.sub('\s(?=\d\d,)',",",x).split() for x in f if x.strip()]
    print(zip(*lines))
[('Number', 'Data', 'Date'), ('A1', 'ABCD', 'Jan,04,2013'), ('A2', 'ABCD', 'Jan,04,2013'), ('A3', 'ABCD', 'Jan,04,2013'), ('A4', 'ABCD', 'Jan,04,2013')]

Writing is trivial:

import re
import csv
with open("in.txt") as f:
    lines = [re.sub('\s(?=\d\d,)',",",x).split() for x in f if x.strip()]
    zipped = zip(*lines)
    with open("out.csv","w") as f1:
        wr = csv.writer(f1)
        wr.writerows(zipped)
Sign up to request clarification or add additional context in comments.

1 Comment

This answer made me go back and look up regex syntax. Thanks
1

You can still use pandas.

import pandas as pd
data = pd.read_csv("input.txt", delim_whitespace=True , header = None, index_col = 0)
data = data.dropna()
data = data.transpose()
data.to_csv("output.csv", index = False)

In the above code, data.dropna() allows to remove empty lines, and data.transpose() lets you transpose your dataframe.

The output looks like this:

Number,Data,Date
A1,ABCD,"Jan 04,2013"
A2,ABCD,"Jan 04,2013"
A3,ABCD,"Jan 04,2013"
A4,ABCD,"Jan 04,2013"

Comments

0

You have a couple of problems, starting with your attempts to split the file with '||' and '"', when those aren't your separators. You can build a table line-by-line and then transpose + write into the csv file.

(edit) I Didn't account for spaces inside quotes. Updated to honor quotes and to use ';' as a delimiter since your dates include commas. I used a regex to find words without spaces or words in quotes, then removed the quotes.

import csv
import re

find_cells_re = re.compile(r'\w+|"[^"]*"')

with open('input.txt', "r") as f:
    # extract rows, filtering out empty lines
    table = [row for row in 
        (cell.strip('"') for cell in 
        (find_cells_re.findall(line) for line in f))
        if row]
with open("output.csv", "w") as fw:
    writer = csv.writer(rw)
    for row in zip(*table):
        writer.writerow(row)

2 Comments

this is still going to split the date which is the biggest issue
@PadraicCunningham - you're right. This turned out to be a bit more complicated.
0

Set delimiter=',' for changing to CSV.

1 Comment

Please elaborate and make your answer more detailed. As it is now, it does not provide real value.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.