Decoding utf-8 in python

Question

I have an expression like this that produces the list of bytes of the utf-8 representation.

list(chr(number).encode("utf-8"))

But how to do this in reverse?

Say, I have 2 bytes [292, 200] as a list, how can I decode them into a symbol?

Hello, can you provide one example of a number ?

pyOliv
– pyOliv

2020-05-09 11:33:47 +00:00
Commented May 9, 2020 at 11:33 — pyOliv
– pyOliv, Commented May 9, 2020 at 11:33
b.decode('utf-8','replace')

kofemann
– kofemann

2020-05-09 11:34:55 +00:00
Commented May 9, 2020 at 11:34 — kofemann
– kofemann, Commented May 9, 2020 at 11:34
print (list(chr(200).encode("utf-8"))) gives [195, 136]

Dmitry Starostin
– Dmitry Starostin

2020-05-09 11:52:23 +00:00
Commented May 9, 2020 at 11:52 — Dmitry Starostin
– Dmitry Starostin, Commented May 9, 2020 at 11:52

lenz · Accepted Answer · 2020-05-09 12:02:03Z

2

You can call bytes on a list of integers in the range 0..255.

So your example reverses like this:

>>> bytes([195, 136]).decode('utf8')
'È'

If you want the codepoint, wrap it in ord():

>>> ord(bytes([195, 136]).decode('utf8'))
200

Note: the last step only works if the byte sequence corresponds to a single Unicode character (codepoint).

answered May 9, 2020 at 12:02

lenz

5,8585 gold badges27 silver badges47 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Robson Sampaio · Accepted Answer · 2020-05-09 12:41:52Z

You have to remember that char only stores 8 bits: -128 to 127. So if 'number' is bigger than char limits it won't work.

number = 127
print(f"number: {number}")
li = list(chr(number).encode("utf-8"))
print(f"List of byte: {li}")
dec = int.from_bytes(li, byteorder='big')
print(f"Type dec: {type(dec)}")
print(f"Value dec: {dec}")

number = 128
print(f"number: {number}")
li = list(chr(number).encode("utf-8"))
print(f"List of byte: {li}")
dec = int.from_bytes(li, byteorder='big')
print(f"Type dec: {type(dec)}")
print(f"Value dec: {dec}")

Take a look at python documentation for converting values

Collectives™ on Stack Overflow

Decoding utf-8 in python

2 Answers 2

Comments

Comments

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Related