6

I want to read a csv file with each line dictated by a newline character ('\n') using Python 3. This is my code:

import csv
with open(input_data.csv, newline ='\n') as f:
        csvread = csv.reader(f)
        batch_data = [line for line in csvread]

This above code gave error:

batch_data = [line for line in csvread].
_csv.Error: new-line character seen in unquoted field - do you need to open the file in universal-newline mode?

Reading these posts: CSV new-line character seen in unquoted field error, also tried these alternatives that I could think about:

with open(input_data.csv, 'rU', newline ='\n') as f:
        csvread = csv.reader(f)
        batch_data = [line for line in csvread]


with open(input_data.csv, 'rU', newline ="\n") as f:
        csvread = csv.reader(f)
        batch_data = [line for line in csvread]

No luck of geting this correct yet. Any suggestions?

I am also reading the documentation about newline: if newline='' is not specified, newlines embedded inside quoted fields will not be interpreted correctly, and on platforms that use \r\n line on write an extra \r will be added. It should always be safe to specify newline='', since the csv module does its own (universal) newline handling.

So my understanding of this newline method is:

1) it is a necessity,

2) does it indicate the input file would be split into lines by empty space character?

4
  • try to open the file in binary mode open("filename.csv", 'rb') Commented Nov 7, 2016 at 23:52
  • I've seen this happen when you have lone CR's (\r) in the file. Try to split the lines and strip whitespace. Commented Nov 8, 2016 at 0:00
  • 2
    You're supposed to pass newline='' in Python 3 and let the csv module handle the newlines. Commented Nov 8, 2016 at 1:50
  • @thebjorn: That's only valid on Python 2. Commented Nov 8, 2016 at 1:51

1 Answer 1

15
  1. newline='' is correct in all csv cases, and failing to specify it is an error in many cases. The docs recommend it for the very reason you're encountering.

  2. newline='' doesn't mean "empty space" is used for splitting; it's specifically documented on the open function:

If [newline] is '', universal newlines mode is enabled, but line endings are returned to the caller untranslated.

So with newline='' all original \r and \n characters are returned unchanged. Normally, in universal newlines mode, any newline like sequence (\r, \n, or \r\n) is converted to \n in the input. But you don't want this for CSV input, because CSV dialects are often quite picky about what constitutes a newline (Excel dialect requires \r\n only).

Your code should be:

import csv
with open('input_data.csv', newline='') as f:
    csvread = csv.reader(f)
    batch_data = list(csvread)

If that doesn't work, you need to look at your CSV dialect and make sure you're initializing csv.reader correctly.

Sign up to request clarification or add additional context in comments.

5 Comments

thanks so much for pointing me to the right documentation of open function. Just to confirm I understand you correctly, if the input file is using '\n', the code you recommended would read and split each row properly, right?
I was repetitively asking for confirmation, b/c the input file is too big to open as a csv (I can't eyeball see it). The only info that know about it is "\n" separate each row. I don't know to to verify my code was doing the right row separation by comparing the real csv file and the code read in file.
@enaJ: Yes. It doesn't matter what line ending convention the input file uses when you use newline='', it will treat any possible line ending as being the end of the line and return the data from that line (including the unconverted characters representing the end of the line). The csv module will recognize endings that don't match the CSV dialect and combine lines as needed to match the dialect chosen (and combine lines when the newline occurs inside a quoted field, so an embedded newline in a field doesn't turn it into multiple records on read).
thanks again for your great help and patience!! Let me ask one more question on this front: how if 'newline ='' " ' is used for all input cases, how does it differentiate a input file use '/n' as new line deliminator and another file use ', "?
@enaJ: What format are you using where records (as opposed to fields) are separated by commas? That question doesn't even make sense. For the record, csv is documented to ignore the value of lineterminator for readers and just treat either \r or \n as a line terminator; you can't use non-newline-y characters to separate records on read.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.