Decoding JSON that contains Base64

Question

I'm sending a request for a set of images to one of my API's. The API returns these images in a JSON format. This format contains data about the resource together with a single property that represents the image in Base64.

An example of the JSON being returned.

{
    "id": 548613,
    "filename": "00548613.png",
    "pictureTaken": "2020-03-30T11:38:21.003",
    "isVisible": true,
    "lotcode": 23,
    "company": "05",
    "concern": "46",
    "base64": "..."
}

The correct content of the Base64
The incorrectly parsed Base64

This is done with the Python3 requests library. When i receive a successful response from the API i attempt to decode the body to JSON using:

url = self.__url__(f"/rest/all/V1/products/{sku}/images")
headers = self.__headers__()
r = requests.get(url=url, headers=headers)
if r.status_code == 200:
    return r.json()
elif r.status_code == 404:
    return None
else:
    raise IOError(
        f"Error retrieving product '{sku}', got {r.status_code}: '{r.text}'")

Calling .json() results in the Base64 content being messed up, some parts are not there, and some are replaced with other characters. I tried manually decoding the content using r.content.decode() with the utf-8 and ascii options to see if this was the problem after seeing this post. Sadly this didn't work. I know the response from the server is correct, it works with Postman, and calling print(r.content) results in a JSON document containing the valid Base64.

How would i go about de-serializing the response from the API to get the valid Base64?

@Trenton I assume you mean the Base64, sadly i cannot share it because i do not have ownership of the serialized resources. — Harjan
– Harjan, Commented Jun 15, 2020 at 18:00
@Harjan Take a random image of a duck. Convert it to base64. Put that base64 in a request like the one you provided and see if the problem arises. If yes, post that request so we can try. — Bakuriu
– Bakuriu, Commented Jun 15, 2020 at 20:05
@Trenton I have added some Base64, it should be a 1024x1024 picture of a pink and white box when parsed correctly. — Harjan
– Harjan, Commented Jun 16, 2020 at 7:10

qedk · Accepted Answer · 2020-06-17 07:58:31Z

1

import base64
import re
...
b64text = re.search(b"\"base64\": \"(?P<base>.*)\"", r.content, flags=re.MULTILINE).group("base")
decode = base64.b64decode(b64text).decode(utf-8)

Since you're saying "calling print(r.content) results in the valid Base64", it's just a matter of decoding the base64.

edited Jun 17, 2020 at 7:58

answered Jun 15, 2020 at 17:19

qedk

5286 silver badges19 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

Harjan Over a year ago

Good suggestion, i think this might have worked if it was just Base64 that was being returned. Calling this on my content results in the entire JSON response being decoded from Base64.

qedk Over a year ago

@Harjan then it's just a matter of extracting the base64 data from the text directly, see my answer for an example implementation.

Harjan Over a year ago

I tried your edited solution. But calling r.content or r.text results in the same corrupted Base64. Extracting works, but parsing is not possible because it still contains the illegal characters.

qedk Over a year ago

@Harjan Check your content-type and charset, the default in requests is text/html, you can set a charset utf-8, that's probably not what your API is using, set the appropriate value using r.encoding and retry. Have you tried using urrlib and reproducing this behaviour?

PruthviRaj Reddy Over a year ago

follow stackoverflow.com/questions/37225035/…

|

Collectives™ on Stack Overflow

Decoding JSON that contains Base64

1 Answer 1

7 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

7 Comments

Linked

Related