So in a python terminal I type the following:
>>> s = "γειά" ## it just means 'hi' in Greek
>>> s
'\x9a\x9c\xa0\xe1' ## What is this? - Is it utf-encoding? Is it ascii escaped?
>>> print s
γειά
and now the fun part:
>>> a = u"γειά"
>>> a
u'\u03b3\u03b5\u03b9\u03ac' # Again what is this? utf-8 encoded? If so, how?
>>> print a
γειά
I am totally confused over encodings and particularly on utf-8 encoded strings and/or ascii encoded strings. What would be the difference between the above 2 snippets and how do they tie-in the unicode function?
>>> result = unicode(s)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0x9a in position 0: ordinal
not in range(128)
>>> result = unicode(s, 'utf-8')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'utf8' codec can't decode byte 0x9a in position 0: invalid s
tart byte
Could someone explain to me what's happening here? Thanks in advance.