2

I connect to a mysql database using pymysql and after executing a request I got the following string: \xd0\xbc\xd0\xb0\xd1\x80\xd0\xba\xd0\xb0.

This should be 5 characters in utf8, but when I do print s.encode('utf-8') I get this: ╨╝╨░╤А╨║╨░. The string looks like byte representation of unicode characters, which python fails to recognize.

So what do I do to make python process them properly?

2 Answers 2

5

You want to decode (not encode) to get a unicode string from a byte string.

>>> s = '\xd0\xbc\xd0\xb0\xd1\x80\xd0\xba\xd0\xb0'
>>> us = s.decode('utf-8')
>>> print us
марка

Note that you may not be able to print it because it contains characters outside ASCII. But you should be able to see its value in a Unicode-aware debugger. I ran the above in IDLE.

Update

It seems what you actually have is this:

>>> s = u'\xd0\xbc\xd0\xb0\xd1\x80\xd0\xba\xd0\xb0'

This is trickier because you first have to get those bytes into a bytestring before you call decode. I'm not sure what the "best" way to do that is, but this works:

>>> us = ''.join(chr(ord(c)) for c in s).decode('utf-8')
>>> print us
марка

Note that you should of course be decoding it before you store it in the database as a string.

Sign up to request clarification or add additional context in comments.

2 Comments

Thank you. When I tried decode I got an error saying "UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-9: ordinal not in range(128)". Must be because the string was represented as u\xd0\xbc\xd0\xb0\xd1\x80\xd0\xba\xd0\xb0
print s.encode('latin1').decode('utf8') also worked for me.
4

Mark is right: you need to decode the string. Byte strings become Unicode strings by decoding them, encoding goes the other way. This and many other details are at Pragmatic Unicode, or, How Do I Stop The Pain?.

1 Comment

I've fallen foul of this in the past - and just try to remember that one "decodes" bytes, but "encodes" text.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.