Base64 decoding and encoding give different results

Question

I have the two following encoded string :

base64_str1 = 'eyJzZWN0aW9uX29mZnNldCI6MiwiaXRlbXNfb2Zmc2V0IjozNiwidmVyc2lvbiI6MX0%3D'
base64_str2 = 'eyJzZWN0aW9uX29mZnNldCI6MCwiaXRlbXNfb2Zmc2V0IjowLCJ2ZXJzaW9uIjoxfQ%3D%3D'

Using Base64 online decoder/encoder , the results are as follow (which are the right results) :

base64_str1_decoded = '{"section_offset":2,"items_offset":36,"version":1}7'
base64_str2_decoded = '{"section_offset":0,"items_offset":0,"version":1}'

However, when I tried to encode base64_str1_decoded or base64_str2_decoded back to Base64, I'm not able to obtain the initial base64 strings.

For instance, the ouput for the following code :

base64_str2_decoded = '{"section_offset":0,"items_offset":0,"version":1}'
recoded_str2 = base64.b64encode(bytes(base64_str2_decoded, 'utf-8'))
print(recoded_str2)

# output = b'eyJzZWN0aW9uX29mZnNldCI6MCwiaXRlbXNfb2Zmc2V0IjowLCJ2ZXJzaW9uIjoxfQ=='
# expected_output = eyJzZWN0aW9uX29mZnNldCI6MCwiaXRlbXNfb2Zmc2V0IjowLCJ2ZXJzaW9uIjoxfQ%3D%3D

I tried changing the encoding scheme but can't seem to make it work.

In the future, it may be more useful to use something like Code Sand Box to test your encoding and show a complete example when possible. The online encoder mentioned is very helpful but it's not a 1:1 comparison for output as shown buy it stripping the b' from the final output, which is used in Python to denote that the string is actually a bytes object. I put together a simple example here: codesandbox.io/p/sandbox/peaceful-ramanujan-w6hk52 — TristanZimmerman
– TristanZimmerman, Commented Jan 21, 2023 at 2:18

Mark Tolonen · Accepted Answer · 2023-01-21 05:09:39Z

Notice that extra 7 at the end of base64_str1_decoded? That's because your input strings are incorrect. They have escape codes required for URLs. %3D is an escape code for =, which is what should be entered into the online decoder instead. You'll notice the 2nd string in the decoder has an extra ÃÜ on the next line you haven't shown due to using %3D%3D instead of ==. That online decoder is allowing invalid base64 to be decoded.

To correctly decode in Python use urllib.parse.unquote on the string to remove the escaping first:

import base64
import urllib.parse

base64_str1 = 'eyJzZWN0aW9uX29mZnNldCI6MiwiaXRlbXNfb2Zmc2V0IjozNiwidmVyc2lvbiI6MX0%3D'
base64_str2 = 'eyJzZWN0aW9uX29mZnNldCI6MCwiaXRlbXNfb2Zmc2V0IjowLCJ2ZXJzaW9uIjoxfQ%3D%3D'

# Demonstrate Python decoder detects invalid B64 encoding
try:
    print(base64.b64decode(base64_str1))
except Exception as e:
    print('Exception:', e)
try:
    print(base64.b64decode(base64_str2))
except Exception as e:
    print('Exception:', e)

# Decode after unquoting...
base64_str1_decoded = base64.b64decode(urllib.parse.unquote(base64_str1))
base64_str2_decoded = base64.b64decode(urllib.parse.unquote(base64_str2))
print(base64_str1_decoded)
print(base64_str2_decoded)

# See valid B64 encoding.
recoded_str1 = base64.b64encode(base64_str1_decoded)
recoded_str2 = base64.b64encode(base64_str2_decoded)
print(recoded_str1)
print(recoded_str2)

Output:

Exception: Invalid base64-encoded string: number of data characters (69) cannot be 1 more than a multiple of 4
Exception: Incorrect padding
b'{"section_offset":2,"items_offset":36,"version":1}'
b'{"section_offset":0,"items_offset":0,"version":1}'
b'eyJzZWN0aW9uX29mZnNldCI6MiwiaXRlbXNfb2Zmc2V0IjozNiwidmVyc2lvbiI6MX0='
b'eyJzZWN0aW9uX29mZnNldCI6MCwiaXRlbXNfb2Zmc2V0IjowLCJ2ZXJzaW9uIjoxfQ=='

Note that the b'' notation is Python's indication that the object is a byte string as opposed to a Unicode string and is not part of the string itself.

Collectives™ on Stack Overflow

Base64 decoding and encoding give different results

1 Answer 1

Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Related