3

I am trying to open some file and I know there are some errors in the file with UTF-8 encoding, so what I will do in python3 is

open(fileName, 'r', errors = 'ignore') 

but now I need to use python2, what are the corresponding way to do this?

Below is my code after changing to codecs

    with codecs.open('data/journalName1.csv', 'rU', errors="ignore") as file:
        reader = csv.reader(file)
        for line in reader:
            print(line) 

And file is here https://www.dropbox.com/s/9qj9v5mtd4ah8nm/journalName.csv?dl=0

2
  • is it possible to share the file? Commented Jun 8, 2015 at 1:35
  • It is not the problem with the file, a lot of file can cause error, I am just asking how to cope with the error. Commented Jun 8, 2015 at 2:14

2 Answers 2

8

Python 2 does not support this using the built-in open function. Instead, you have to uses codecs.

import codecs
f = codecs.open(fileName, 'r', errors = 'ignore')

This works in Python 2 and 3 if you decide you need to switch your python version in the future.

Sign up to request clarification or add additional context in comments.

2 Comments

It is still not correct, "UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe6 in position 2028: invalid continuation byte" there is still error, I am going to add my code and upload file.
Actually I just copy the lines into another file, there is no error, but for this file, if I used the old way in python3, it can pass, but using codecs.open in python2, there is still error, please help me, thank you!
1

For UTF-8 encoded files I would suggest io module.

#!/usr/bin/python
# -*- coding: utf-8 -*-

import io

f=io.open('file.txt', 'r',  encoding='utf8')
s=f.read()
f.close()

2 Comments

some errors in the file with UTF-8 encoding means that it really itsn't a pure UTF-8 file.
My guess was the OP got an error: "UnicodeEncodeError: 'ascii' codec can't encode character u'\xa0' in position 20: ordinal not in range(128)" or the like. It happens sometimes with UTF-8 decoded strings. Such an error is usually fixed by encoding in UTF-8, not ASCII.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.