If I have a unicode string such as:
s = u'c\r\x8f\x02\x00\x00\x02\u201d'
how can I convert this to just a regular string that isn't in unicode format; i.e. I want to extract:
f = '\x00\x00\x02\u201d'
and I do not want it in unicode format. The reason why I need to do this is because I need to convert the unicode in s to an integer value, but if I try it with just s:
int((s[-4]+s[-3]+s[-2]+s[-1]).encode('hex'), 16)
Traceback (most recent call last):
File "<pyshell#48>", line 1, in <module>
int((s[-4]+s[-3]+s[-2]+s[-1]).encode('hex'), 16)
File "C:\Python27\lib\encodings\hex_codec.py", line 24, in hex_encode
output = binascii.b2a_hex(input)
UnicodeEncodeError: 'ascii' codec can't encode character u'\u201d' in position 3: ordinal not in range(128)
yet if I do it with f:
int(f.encode('hex'), 16)
664608376369508L
And this is the correct integer value I want to extract from s. Is there a method where I can do this?
\u201din there then by definition you want a Unicode string. You should review your requirements and probably update your question with an unambiguous problem statement.c\r\x8f\x02? Also,sis not UTF-8, and\u201din a bytestring literal produces an actual backslash and the charactersu201d, so if you really want that result (and 664608376369508L would seem to indicate you do), you've got a really weird conversion in mind. Maybe you messed up your data somewhere upstream, and you should fix it there.\u201dcharacter is. This protocol talks to a device that sends backs. Ins, only what's listed infcontains data. I need to decodefinto an integer. (The 664608376369508L I listed is not correct). Normally, the device sends back something like:\x00\x00\x03\xccwhich I can easily convert to972, but when I receive something like:\u201dor similar, I don't know how to handle it.