1

Edit: http://pastebin.com/W4iG3tjS - the file

I have a text file encoded in utf8 with some Cyrillic text it. To load it, I use the following code:

import codecs
fopen = codecs.open('thefile', 'r', encoding='utf8')
fread = fopen.read()

fread dumps the file on the screen all unicodish (escape sequences). print fread displays it in readable form (ASCII I guess).

I then try to split it and write it to an empty file with no encoding:

a = fread.split()
for l in a: 
    print>>dasFile, l

But I get the following error message: UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-13: ordinal not in range(128)

Is there a way to dump fread.split() into a file? How can I get rid of this error?

1
  • 2
    Can you post a sample of the text? Commented Jun 11, 2012 at 10:04

2 Answers 2

4

Since you've opened and read the file via codecs.open(), it's been decoded to Unicode. So to output it you need to encode it again, presumably back to UTF-8.

for l in a:
    dasFile.write(l.encode('utf-8'))
Sign up to request clarification or add additional context in comments.

1 Comment

@abruski: make sure you .close() the file to ensure any changes are flushed from the buffer.
0

print is going to use the default encoding, which is normally "ascii". So you see that error with print. But you can open a file and write directly to it.

a = fopen.readlines() # returns a list of lines already, with line endings intact
# do something with a
dasFile.writelines(a) # doesn't add line endings, expects them to be present already.

assuming the lines in a are encoded already.

PS. You should also investigate the io module.

1 Comment

@abruski oh, yes, you split it. I forgot about that. I'll edit the answer.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.