0

I have csv files with unwanted first characters in the header row except the first column. The while loop strips the first character from the headers and writes the new header row to a new file (exit by counter). The else statement then writes the rest of the rows to the new file. The problem is the else statement begins with the header row and writes it a second time. Is there a way to have else begin an the next line with out breaking the for iterator? The actual files are 21 columns by 400,000+ rows. The unwanted character is a single space, but I used * in the example below to make it easier to see. Thanks for any help!

file.csv =

a,*b,*c,*d

1,2,3,4

import csv

reader = csv.reader(open('file.csv', 'rb'))

writer = csv.writer(open('file2.csv','wb'))

count = 0

for row in reader:
    while (count <= 0):
        row[1]=row[1][1:]
        row[2]=row[2][1:]
        row[3]=row[3][1:]
        writer.writerow([row[0], row[1], row[2], row[3]])
        count = count + 1
    else:
        writer.writerow([row[0], row[1], row[2], row[3]])
2
  • Removing these unwanted characters -- is this the only purpose of your code? Commented Aug 5, 2013 at 3:13
  • Yes, however, this is just a small part of optimizing a very large dataset for import to a database @djas Commented Aug 5, 2013 at 3:25

4 Answers 4

1

If you only want to change the header and copy the remaining lines without change:

with open('file.csv', 'r') as src, open('file2.csv', 'w') as dst:
    dst.write(next(src).replace(" ", ""))     # delete whitespaces from header
    dst.writelines(line for line in src)

If you want to do additional transformations you can do something like this or this question.

Sign up to request clarification or add additional context in comments.

1 Comment

This code will delete all white spaces in the header though -- something you may or may not want to do.
0

If all you want to do is remove spaces, you can use:

string.replace(" ", "")

Comments

0

Hmm... It seems like your logic might be a bit backward. A bit cleaner, I think, to check if you're on the first row first. Also, a slightly more idiomatic way to remove spaces is to use string's lstrip method with no arguments to remove leading whitespace.

Why not use enumerate and check if your row is the header?

import csv

reader = csv.reader(open('file.csv', 'rb'))

writer = csv.writer(open('file2.csv','wb'))

for i, row in enumerate(reader):
    if i == 0:            
        writer.writerow([row[0], 
                         row[1].lstrip(), 
                         row[2].lstrip(), 
                         row[3].lstrip()])
    else:
        writer.writerow([row[0], row[1], row[2], row[3]])

1 Comment

Nice, now i see why my code was duplicating the header row. @jrs
0

If you have 21 columns, you don't want to write row[0], ... , row[21]. Plus, you want to close your files after opening them. .next() gets your header. And strip() lets you flexibly remove unwanted leading and trailing characters.

import csv

file = 'file1.csv'
newfile = open('file2.csv','wb')
writer = csv.writer(newfile)

with open(file, 'rb') as f:
  reader = csv.reader(f)
  header = reader.next()

  newheader = []  
  for c in header:
    newheader.append(c.strip(' '))
    writer.writerow(newheader)  

  for r in reader:
    writer.writerow(r)  

newfile.close()

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.