If I'm encoding a string using utf-16be and decoding the encoded string using utf-8, I'm not getting any error and the output seems to be correctly getting printed on the screen as well but still I'm not able to convert the decoded string into Python representation using json module.
import json
str = '{"foo": "bar"}'
encoded_str = str.encode("utf-16be")
decoded_str = encoded_str.decode('utf-8')
print(decoded_str)
print(json.JSONDecoder().decode(decoded_str))
I know that encoded string should be decoded using the same encoding, but why this behaviour is what I'm trying to understand? I want to know:
Why encoding
strwithutf-16beand decodingencoded_strwithutf-8doesn't result in an error?As encoding and decoding is not resulting in an error and the
decoded_stris a valid JSON (as can be seen through the print statement), whydecode(decoded_str)result in an error?Why writing the output to a file and viewing the file through
lesscommand show it as binary file?file = open("data.txt", 'w') file.write(decoded_str)When using
lesscommand to view thedata.txt:"data.txt" may be a binary file. See it anyway?If the
decoded_stris an invalid JSON or something else, how can I view it in its original form (print()is printing it as a valid JSON )
I'm using Python 3.10.12 on Ubuntu 22.04.4 LTS
hexdump data.txt: you see (effectively)utf-16be-encoded text; note thatdecoded_strcontains'\x00{\x00"\x00f\x00o\x00o\x00"\x00:\x00 \x00"\x00b\x00a\x00r\x00"\x00}'