3

I have searched and found some related problems but the way they deal with Unicode is different, so I can't apply the solutions to my problem.

I won't paste my whole code but I'm sure this isolated example code replicates the error: (I'm also using wx for GUI so this is like inside a class)

#coding: utf-8
...
something = u'ЧЕТЫРЕ'
//show the Russian text in a Label on the GUI
self.ExampleLabel.SetValue(str(self.something))

On Eclipse everything works perfectly and it displays the Russian characters. However when I try to open up Python straight through the file I get this error on the CL:

UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-11: 
ordinal not in range(128)

I figured this has something to do with the CL not being able to ouput the Unicode chars and Eclipse doing behind-the-scene magic. Any help on how to make it so that it works on its own?

3
  • Why are you calling str() at all? I made a GUI that had to deal with Korean text and found that WxPython widgets supported unicode natively. Commented Sep 2, 2012 at 9:46
  • Wow you're right, I just figured that I started working my with numbers that's why I needed the str() but then continued with phrases so it kind of stuck. Didn't know you could use it directly. Commented Sep 2, 2012 at 10:29
  • A word of caution, I had some issues when concatenating text that mixing unicode data types with string datatypes ended with encode errors. You just need to be careful about what you're doing and make sure you cast variables where needed. Commented Sep 2, 2012 at 16:43

3 Answers 3

5

When you call str() on something without specifying an encoding, the default encoding is used, which depends on the environment your program is running in. In Eclipse, that's different from the command line.

Don't rely on the default encoding, instead specify it explicitly:

self.ExampleLabel.SetValue(self.something.encode('utf-8'))

You may want to study the Python Unicode HOWTO to understand what encoding and str() do with unicode objects. The wxPython project has a page on Unicode usage as well.

Sign up to request clarification or add additional context in comments.

3 Comments

Thanks, I could actually use this. Thansk for the links to further reading as well. :)
Actually, I think I may be doing something wrong since my variable could either be an int or a unicode string. There is no encode method for ints so I couldn't use it. Is this bad design or is there a way to handle both cases since even though it's an int I want to treat it as a string. But then typecasting using str() won't work with the unicode.
You can do two things: use a try: except: block or test for if isinstance(self.something, int): and branch your code to handle both cases.
1

Try self.something.encode('utf-8') instead.

Comments

1

If you use repr instead of str it should handle the conversion for you and also cover the case that the object is not always of type string, but you may find that it gives you an extra set of quotes or even the unicode u in your context. repr is safer than str - str assumes ascii encoding, but repr is going to show your codepoints in the same way that you would see them in code, since wrapping with eval is supposed to convert it back to what it was - the repr has to be in a form that the python code would be in, namely ascii safe since most python code is written in ascii.

1 Comment

I am glad this was present. I had a variation of this issue (works in eclipse and not in commandline even when using .encode("utf-8"). when I changed it to repr() it worked fine!

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.