10

After I learned about reading unicode files in Python 3.0 web script, now it's time for me to learn using print() with unicode.

I searched for writing unicode, for example this question explains that you can't write unicode characters to non-unicode console. However, in my case, the output is given to Apache and I am sure that it is capable of handling unicode text. For some reason, however, the stdout of my web script is in ascii.

Obviously, if I was opening a file to write myself, I would do something like

open(filename, 'w', encoding='utf8')

but since I'm given an open stream, I resorted to using

sys.stdout.buffer.write(mytext.encode('utf-8'))

and everything seems to work. Does this violate some rule of good behavior or has any unintended consequences?

1
  • you can write Unicode characters that are not supported by the current (Windows) console encoding if you use Win32 API such as WriteConsoleW(). win-unicode-console Python package mentioned below does it for you. Though it has nothing to do with Apache. Commented Apr 11, 2015 at 14:15

2 Answers 2

11

I don't think you're breaking any rule, but

sys.stdout = codecs.EncodedFile(sys.stdout, 'utf8')

looks like it might be handier / less clunky.

Edit: per comments, this isn't quite right -- @Miles gave the right variant (thanks!):

sys.stdout = codecs.getwriter('utf8')(sys.stdout.buffer) 

Edit: if you can arrange for environment variable PYTHONIOENCODING to be set to utf8 when Apache starts your script, that would be even better, making sys.stdout be set to utf8 automatically; but if that's unfeasible or impractical the codecs solution stands.

Sign up to request clarification or add additional context in comments.

10 Comments

With this line I get "TypeError: can't write bytes to text stream"
I think it's because stdout starts already being a text stream with a wrong ascii codec.
Try: sys.stdout = codecs.getwriter('utf8')(sys.stdout.buffer)
@Miles, you have it just right -- hope you don't mind if I edit my answer to include your better idea...!
No problem. I didn't make my own answer because I'm not sure what constitutes "best practice" for a lot of Python 3 encoding issues. One thing I don't like is that, if all references to the original stdout TextIOWrapper are lost (if sys.__stdout__ is overwritten, for instance), the underlying buffer will be closed, and there is no way around that, AFAICT, other than to make sure a reference is maintained.
|
1

This is an old answer but I'll add my version here since I first ventured here before finding my solution.

One of the issues with codecs.getwriter is if you are running a script of sorts, the output will be buffered (whereas normally python stdout prints after every line).

sys.stdout in the console is a IOTextWrapper, so my solution uses that. This also allows you to set line_buffering=True or False.

For example, to set stdout to, instead of erroring, backslash encode all output:

sys.stdout = io.TextIOWrapper(sys.stdout.detach(), encoding=sys.stdout.encoding,
                              errors="backslashreplace", line_buffering=True)

To force a specific encoding (in this case utf8):

sys.stdout = io.TextIOWrapper(sys.stdout.detach(), encoding="utf8",
                              line_buffering=True)

A note, calling sys.stdout.detach() will close the underlying buffer. Some modules use sys.__stdout__, which is just an alias for sys.stdout, so you may want to set that as well

sys.stdout = sys.__stdout__ = io.TextIOWrapper(sys.stdout.detach(), encoding=sys.stdout.encoding, errors="backslashreplace", line_buffering=True)
sys.stderr = sys.__stderr__ = io.TextIOWrapper(sys.stderr.detach(), encoding=sys.stdout.encoding, errors="backslashreplace", line_buffering=True)

2 Comments

I've seen very similar solutions in several places, but I found a problem with it (Windows, python 3.6): If you do something like "myprog.py | head", then python throws a strange error: "Exception ignored in: <encodings.utf_8.StreamWriter object at 0x000001D4AECF09B0". What I found that works and avoids this problem "sys.stdout = open(1, 'w', encoding='utf-8', closefd=False)". Its really frustrating that there appears to be no simple, obvious, clear solution to being able to print from python without being tripped up by so many corner cases.
Interesting... I can reproduce the following error when stdout is presumably closed before reading all of it, on cmd.exe and msys bash. On 3.5 and 3.6... Traceback (most recent call last): File "crash_in_head.py", line 7, in <module> print('hi') OSError: [Errno 22] Invalid argument Exception ignored in: <_io.TextIOWrapper name='<stdout>' encoding='utf8'> OSError: [Errno 22] Invalid argument Your suggestion does fix that issue!

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.