9

I'll start by saying that I've already seen this post: Strange python print behavior with unicode, but the solution offered there (using PYTHONIOENCODING) didn't work for me.

Here's my issue:

Python 2.6.5 (r265:79063, Apr  9 2010, 11:16:46)
[GCC 4.1.2 20080704 (Red Hat 4.1.2-44)] on linux2
>>> a = u'\xa6'
>>> print a 
¦

works just fine, however:

>>> sys.stdout.write(a)
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
UnicodeEncodeError: 'ascii' codec can't encode character u'\xa6' in position 0: ordinal not in range(128)

throws an error. The post I linked to at the top suggests that this is because the default console encoding is 'ascii'. However, in my case it's not:

>>> sys.stdout.encoding
'UTF-8'

So any thoughts on what's at work here and how to fix this issue?

Thanks D.

3
  • 1
    On python 2.7 with utf-8 terminal encoding, everything seems to be working. Can you try sys.stdout.write(a.encode("UTF-8")) and try to see what happens? Commented Nov 4, 2011 at 22:10
  • Yep, that worked... Oops, I just realized that I used wrong Python version to generate the sample. I should've used 2.6.5. So why is this happening? A bug in pre 2.7 Python? Commented Nov 4, 2011 at 22:13
  • Appearantly, when trying to write to stdout, yout Python tries to encode your unicode object with ascii, but fails miserably. I am not sure why, but mine doesn't do that :) Commented Nov 4, 2011 at 22:17

1 Answer 1

12

This is due to a long-standing bug that was fixed in python-2.7, but too late to be back-ported to python-2.6.

The documentation states that when unicode strings are written to a file, they should be converted to byte strings using file.encoding. But this was not being honoured by sys.stdout, which instead was using the default unicode encoding. This is usually set to "ascii" by the site module, but it can be changed with sys.setdefaultencoding:

Python 2.6.7 (r267:88850, Aug 14 2011, 12:32:40) [GCC 4.6.2] on linux3
>>> a = u'\xa6\n'
>>> sys.stdout.write(a)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec cant encode character u'\xa6' ...
>>> reload(sys).setdefaultencoding('utf8')
>>> sys.stdout.write(a)
¦

However, a better solution might be to replace sys.stdout with a wrapper:

class StdOut(object):
    def write(self, string):
        if isinstance(string, unicode):
            string = string.encode(sys.__stdout__.encoding)
        sys.__stdout__.write(string)

>>> sys.stdout = StdOut()
>>> sys.stdout.write(a)
¦
Sign up to request clarification or add additional context in comments.

1 Comment

stdout has many different functions (close, flush, ...). It would be better here to replace the write function only

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.