Base64 encoding issue in Python

Question

I need to save a params file in python and this params file contains some parameters that I won't leave on plain text, so I codify the entire file to base64 (I know that this isn't the most secure encoding of the world but it works for the kind of data that I need to use).

With the encoding, everything works well. I encode the content of my file (a simply txt with a proper extension) and save the file. The problem comes with the decode. I print the text coded before save the file and the text coded from the file saved and there are exactly the same, but for a reason I don't know, the decode of the text of the file saved returns me this error UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8d in position 1: invalid start byte and the decode of the text before save the file works well.

Any idea to resolve this issue?

This is my code, I have tried converting all to bytes, to string, and everything...

params = open('params.bpr','r').read()


paramsencoded = base64.b64encode(bytes(params,'utf-8'))

print(paramsencoded)

paramsdecoded = str(base64.b64decode(str(paramsencoded,'utf-8')),'utf-8')

newparams = open('paramsencoded.bpr','w+',encoding='utf-8')
newparams.write(str(paramsencoded))
newparams.close()

params2 = open('paramsencoded.bpr',encoding='utf-8').read()
print(params2)

paramsdecoded = str(base64.b64decode(str(paramsencoded,'utf-8')),'utf-8')

paramsdecoded = base64.b64decode(str(params2))

print(str(paramsdecoded,'utf-8'))

You don't need to decode the paramsencoded bytes value to a string each time you want to decode the Base64 data. b64decode() accepts bytes too. — Martijn Pieters
– Martijn Pieters, Commented Sep 10, 2018 at 10:03
And instead of reading as text (with 'r') then encoding to bytes, why not read the file as binary (use 'rb' as the mode)? — Martijn Pieters
– Martijn Pieters, Commented Sep 10, 2018 at 10:04
I'm not sure why you open the file for paramsencoded.bpr as 'w+'? You only need to write, not also read, so + can be dropped. The same remark there: open in binary mode and write the bytes value directly without first decoding to str. — Martijn Pieters
– Martijn Pieters, Commented Sep 10, 2018 at 10:05
Last but not least, can you give us a sample params value that reproduces the problem? — Martijn Pieters
– Martijn Pieters, Commented Sep 10, 2018 at 10:06
There is no need to use str(params2) when you just read text from the open('paramsencoded.bpr',encoding='utf-8').read() call. Where exactly is the UnicodeDecodeError thrown, on the last line? — Martijn Pieters
– Martijn Pieters, Commented Sep 10, 2018 at 10:08

Martijn Pieters · Accepted Answer · 2018-09-10 10:30:42Z

Your error lies in your handling of the bytes object returned by base64.b64encode(), you called str() on the object:

newparams.write(str(paramsencoded))

That doesn't decode the bytes object:

>>> bytesvalue = b'abc='
>>> str(bytesvalue)
"b'abc='"

Note the b'...' notation. You produced the representation of the bytes object, which is a string containing Python syntax that can reproduce the value for debugging purposes (you can copy that string value and paste it into Python to re-create the same bytes value).

This may not be that easy to notice at first, as base64.b64encode() otherwise only produces output with printable ASCII bytes.

But your decoding problem originates from there, because when decoding the value read back from the file includes the b' characters at the start. Those first two characters are interpreted as Base64 data too; the b is a valid Base64 character, and the ' is ignored by the parser:

>>> bytesvalue = b'hello world'
>>> base64.b64encode(bytesvalue)
b'aGVsbG8gd29ybGQ='
>>> str(base64.b64encode(bytesvalue))
"b'aGVsbG8gd29ybGQ='"
>>> base64.b64decode(str(base64.b64encode(bytesvalue)))  # with str()
b'm\xa1\x95\xb1\xb1\xbc\x81\xdd\xbd\xc9\xb1\x90'
>>> base64.b64decode(base64.b64encode(bytesvalue))       # without str()
b'hello world'

Note how the output is completely different, because the Base64 decoding is now starting from the wrong place, as b is the first 6 bits of the first byte (making the first decoded byte a 6C, 6D, 6E or 6F bytes, so m,n, o or p ASCII).

You could properly decode the value (using paramsencoded.decode('ascii') or str(paramsencoded, 'ascii')) but you should't treat any of this data as text.

Instead, open your files in binary mode. Reading and writing then operates with bytes objects, and the base64.b64encode() and base64.b64decode() functions also operate on bytes, making for a perfect match:

with open('params.bpr', 'rb') as params_source:
    params = params_source.read()  # bytes object

params_encoded = base64.b64encode(params)
print(params_encoded.decode('ascii'))   # base64 data is always ASCII data

params_decoded = base64.b64decode(params_encoded)

with open('paramsencoded.bpr', 'wb') as new_params:
    newparams.write(params_encoded)  # write binary data

with open('paramsencoded.bpr', 'rb') as new_params:
    params_written = new_params.read()

print(params_written.decode('ascii'))  # still Base64 data, so decode as ASCII

params_decoded = base64.b64decode(params_written)  # decode the bytes value

print(params_decoded.decode('utf8'))  # assuming the original source was UTF-8

I explicitly use bytes.decode(codec) rather than str(..., codec) to avoid accidental str(...) calls.

Yes! This is it, my syntax with this type of code is awful, using decode('ascii) and opening the files with wb and rb treating it like bytes works well. Thanks a lot for the help!

Collectives™ on Stack Overflow

Base64 encoding issue in Python

1 Answer 1

1 Comment

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Linked

Related