Unicode error in python when printing a list

Question

Edit: http://pastebin.com/W4iG3tjS - the file

I have a text file encoded in utf8 with some Cyrillic text it. To load it, I use the following code:

import codecs
fopen = codecs.open('thefile', 'r', encoding='utf8')
fread = fopen.read()

fread dumps the file on the screen all unicodish (escape sequences). print fread displays it in readable form (ASCII I guess).

I then try to split it and write it to an empty file with no encoding:

a = fread.split()
for l in a: 
    print>>dasFile, l

But I get the following error message: UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-13: ordinal not in range(128)

Is there a way to dump fread.split() into a file? How can I get rid of this error?

Can you post a sample of the text?

Joel Cornett
– Joel Cornett

2012-06-11 10:04:26 +00:00
Commented Jun 11, 2012 at 10:04 — Joel Cornett
– Joel Cornett, Commented Jun 11, 2012 at 10:04

Daniel Roseman · Accepted Answer · 2012-06-11 10:04:17Z

4

Since you've opened and read the file via codecs.open(), it's been decoded to Unicode. So to output it you need to encode it again, presumably back to UTF-8.

for l in a:
    dasFile.write(l.encode('utf-8'))

answered Jun 11, 2012 at 10:04

Daniel Roseman

602k68 gold badges910 silver badges923 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

mechanical_meat Over a year ago

@abruski: make sure you .close() the file to ensure any changes are flushed from the buffer.

Keith · Accepted Answer · 2012-06-11 10:34:01Z

0

print is going to use the default encoding, which is normally "ascii". So you see that error with print. But you can open a file and write directly to it.

a = fopen.readlines() # returns a list of lines already, with line endings intact
# do something with a
dasFile.writelines(a) # doesn't add line endings, expects them to be present already.

assuming the lines in a are encoded already.

PS. You should also investigate the io module.

edited Jun 11, 2012 at 10:34

answered Jun 11, 2012 at 10:04

Keith

43.2k11 gold badges61 silver badges77 bronze badges

1 Comment

Keith Over a year ago

@abruski oh, yes, you split it. I forgot about that. I'll edit the answer.

Collectives™ on Stack Overflow

Unicode error in python when printing a list

2 Answers 2

1 Comment

1 Comment

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

1 Comment

Related