6

Let's say

s = u"test\u0627\u0644\u0644\u0647 \u0623\u0643\u0628\u0631\u7206\u767A\u043E\u043B\u043E\u043B\u043E"

If I try to print it directly,

>>> print s
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'cp932' codec can't encode character u'\u0627' in position 4: illegal multibyte sequence

So I change the console into UTF-8 from within Python (otherwise it won't understand my input).

import win32console
win32console.SetConsoleOutputCP(65001)
win32console.SetConsoleCP(65001)

And then output the string encoded as utf-8, because Python doesn't know that chcp 65001 is UTF-8 (a known bug).

>>> print s.encode('utf-8')
testالله أكبر爆発ололоTraceback (most recent call last):
  File "<stdin>", line 1, in <module>
IOError: [Errno 0] Error

As you can see, it prints successfully until it hits a newline, then it throws an IOError.

The following workaround works:

def safe_print(str):
    try:
        print str.encode('utf-8')
    except:
        pass
    print

>>> safe_print(s)
testالله أكبر爆発ололо

But there must be a better way. Any suggestions?

6
  • 1
    I hope you don't actually call the argument str. Avoid shadowing builtins. Commented Aug 16, 2011 at 15:46
  • @Chris: How is one supposed to know what is a builtin and what isn’t? It’s a very natural thing to do. How can you guarantee clean namespace behavior without requiring universal knowledge for starting? Commented Aug 16, 2011 at 19:53
  • In this case, though, it is potentially very confusing, as the str type does have an encode method. Commented Aug 17, 2011 at 10:30
  • @tchrist - Most programming editors with a python mode should highlight builtins in a different colour. This is the easiest way to make sure you don't accidentally use one as a variable or argument name. Commented Aug 22, 2011 at 6:02
  • 3
    @tchrist: If you never use syntax hiliting, you are making your life harder than it needs to be. It catches a lot of small problems, such as ahem shadowing built-ins and unclosed comments/strings. Too fragile and dangerous otherwise. ;-) Commented Sep 9, 2011 at 15:35

2 Answers 2

4

Searching SO for python utf8 windows brings as the first result the question Getting python to print in UTF8 on Windows XP with the console which describes what's the problem with printing utf8 in Windows from Python.

Sign up to request clarification or add additional context in comments.

Comments

1

I didn't test it on windows, but here you can get small initialization script for both win/linux to setup output encoding properly, including logging interface, etc. The module also makes output colored (including update of 'logging' interface)? but you can cut it off unnecessary functionality easily :-).

How to invoke non-colored variant:

#!/usr/bin/env python
# -*- coding: utf-8 -*-
from setupcon import setup_console
setup_console('utf-8', False)

and colored variant:

import setupcon
setupcon.setup_console()
import logging
#...
if setupcon.ansi:
    logging.getLogger().addHandler(setupcon.ColoredHandler())

If the solution works for you, you can either read the documentation here: http://habrahabr.ru/blogs/python/117236/, in Russian, or I/somebody can translate it for you on demand :-).

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.