20

I'm trying to read a dataset using pd.read_csv() am getting an error. Excel can open it just fine.

reviews = pd.read_csv('br.csv') gives the error ParserError: Error tokenizing data. C error: EOF inside string starting at line 312074

reviews = pd.read_csv('br.csv', engine='python', encoding='utf-8') returns ParserError: unexpected end of data

What can I do to fix this?

Edit: This is the dataset - https://www.kaggle.com/gnanesh/goodreads-book-reviews

3
  • 2
    Can you share the data? I'm guessing that, if you were to open it in a text editor, you'd see that there are unbalanced quotation marks. Commented Aug 30, 2018 at 21:38
  • 2
    Or maybe just share line 312074 of that file Commented Aug 30, 2018 at 21:39
  • This is the data: kaggle.com/gnanesh/goodreads-book-reviews Commented Aug 30, 2018 at 21:41

3 Answers 3

34

For me adding this fixed it:

error_bad_lines=False

It just skips the last line. So instead of

reviews = pd.read_csv('br.csv', engine='python', encoding='utf-8')

reviews = pd.read_csv('br.csv', engine='python', encoding='utf-8', error_bad_lines=False)

Sign up to request clarification or add additional context in comments.

1 Comment

error_bad_lines is now deprecated, so you can instead use on_bad_lines e.g. on_bad_lines='warn' or on_bad_lines='skip' to not warn or on_bad_lines='error' to generate an exception
4

In my case, I don't want to skip lines, since my task is required to count the number of data records in the csv file. The solution that works for me is using the Quote_None from csv library. I try this from reading on some websites that I did not remember, but it works.

To describe my case, previouly I have the error: EOF .... Then I tried using the parameter engine='python'. But that introduce another bug for next step of using the dataframe. Then I try quoting=csv.Quote_None, and it's ok now. I hope this helps

import csv    
read_file = read_csv(full_path, delimiter='~', encoding='utf-16 BE', header=0, quoting=csv.QUOTE_NONE)

Comments

0

I used the following code and my issue was solved:

df = pd.read_csv(<filename>, engine="python", encoding='utf-8', on_bad_lines='skip')

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.