2

I'm having some problem with getting Python to handle my unicode text correctly.

I've boiled it down to the following:

>>>print 'Høst'
Høst
>>>print u'Høst'
HÃ,st
>>>u = u'Høst'
>>>u
u'H\xf8st'

sys.stdout.encoding says that it's using UTF-8, which is most likely why the first, non-unicode, print works. If I just need to print something, then this would be fine. However I'm constructing an xml document, from data in a SQL Server and then it really need to be real unicode.

My data looks like it's perfectly good unicode data, u'H\xf8st' look right to me, so why does Python keep outputting it as 'HÃ,st'?

2 Answers 2

3

ø is \xc3\xb8 in ISO-8859-1. \xc3\xb8 is also UTF-8 for the Unicode 00F8 character (ø). Maybe your console really accepts ISO-8859-1 rather than UTF-8 as input, meaning that sys.stdout.encoding is wrong.

Sign up to request clarification or add additional context in comments.

6 Comments

That fixed at least my print issues. Now I can look for something similar to fix my xml generating code.
What problems do you have with xml generation? Just encode your unicode text to correct encoding.
@Mike I generate an XML document, which contains danish characters. Most editors tell me it's unicode and display the æøå characters correctly. When I use the same xml as input to a webservice, using Suds, it for some reason fails to accept the xml, unless I do xml.decode('utf-8'), but then it won't be unicode anymore.
What's the output of type(xml)? xml.decode('utf-8') actually returns a Unicode object.
@jd you're right, it is a unicode object. Sadly that only increases the weirdness. I have a dom object. I use that object to get a unicode string dom.toxml(encoding='utf-8'). The webservice just flat out refuses to accept this directly, I think it might be a suds thing as it gives me an ascii convertion error, you know, the typical unicode covertion error. decode('utf-8') allows suds to accept it, but the result in the other end of the webservice is just ?? rather and ø as expected.
|
0

Are you using ipython? Its unicode support is broken and I'm able to reproduce your output with ipython. Try your example in standard python shell.

4 Comments

This is the standard Python on Ubuntu. I just tested on Windows, and it works as expected. That is: >>>print u'Høst' returns Høst and not HÃ,st-
No bracketed numbers in the output.
Unicode Python console output works on Windows but not Ubuntu. Now that's a surprising turn of events!
@Simon: Works as expected for me, Ubuntu 10.10, Python 2.6.6.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.