I have a large file that has many lines, most of the lines are utf8, but looks like a few of lines are not utf8. When I try to read lines with a code like this:
in_file = codecs.open(source, "r", "utf-8")
for line in in_file:
SOME OPERATIONS
I get the following error:
for line in in_file:
File "C:\Python27\lib\codecs.py", line 681, in next
return self.reader.next()
File "C:\Python27\lib\codecs.py", line 612, in next
line = self.readline()
File "C:\Python27\lib\codecs.py", line 527, in readline
data = self.read(readsize, firstline=True)
File "C:\Python27\lib\codecs.py", line 474, in read
newchars, decodedbytes = self.decode(data, self.errors)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xd8 in position 0: invalid continuation byte
What I would like to do is that for lines that are not utf8 do nothing without breaking the code, and then go to next line in the file and do my operations. How can I do it with try and except?
codecs.open()to handle errors by replacing the characters it cannot decode with placeholders or ignore them altogether, but you need to make sure you actually have the right codec here.