I was following through python's tutorial on unicode and I've got a simple question to ask: When I open up a python shell and type:
>>> unicode('\x80abc')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0x80 in position 0: ordinal
not in range(128)
I get the above error as expected since python attempts to convert the byte \x80 to unicode using the ascii encoding which can go as far as 127. (\x80 is 128).
However if I try again using th utf-8 encoding, I again get an error although somewhat different:
>>> unicode('\x80abc', 'utf8')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Python27\lib\encodings\utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0x80 in position 0: invalid s
tart byte
What is going on here and how should I properly go about it?