4

I need to decode a "UNICODE" encoded string:

>>> id = u'abcdß'
>>> encoded_id = id.encode('utf-8')
>>> encoded_id
'abcd\xc3\x9f'

The problem I have is: Using Pylons routing, I get the encoded_id variable as a unicode string u'abcd\xc3\x9f' instead of a just a regular string 'abcd\xc3\x9f':

Using python, how can I decode my encoded_id variable which is a unicode string?

>>> encoded_id = u'abcd\xc3\x9f'
>>> encoded_id.decode('utf-8')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/test/vng/lib64/python2.6/encodings/utf_8.py", line 16, in         decode
return codecs.utf_8_decode(input, errors, True)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 4-5: ordinal not in range(128)
1
  • If possible, you should figure out why you are getting strings from Pylons incorreclty decoded as latin-1 (or it's close relative, windows-1252) instead of utf-8 to begin with. Commented Sep 27, 2013 at 23:32

1 Answer 1

5

You have UTF-8 encoded data (there is no such thing as UNICODE encoded data).

Encode the unicode value to Latin-1, then decode from UTF8:

encoded_id.encode('latin1').decode('utf8')

Latin 1 maps the first 255 unicode points one-on-one to bytes.

Demo:

>>> encoded_id = u'abcd\xc3\x9f'
>>> encoded_id.encode('latin1').decode('utf8')
u'abcd\xdf'
>>> print encoded_id.encode('latin1').decode('utf8')
abcdß
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.