1

I need to parse this response into a pandas dataframe.

  "predictions": [
    {
        "predicted_label": 4.0,
        "distances": [3.11792408, 3.89746071, 6.32548437],
        "labels": [0.0, 1.0, 0.0]
    },
    {
        "predicted_label": 2.0,
        "distances": [1.08470316, 3.04917915, 5.25393973],
        "labels": [2.0, 2.0, 0.0]
    }
  ]

The end result i am looking for is:

predicted_label distances labels
0 4.0 3.11792408 0.0.
1 4.0 3.89746071 1.0
2. 4.0. 6.32548437 0.0

same for the second predicted_label 2.0.

I tried using:

pd.json_normalize(result['predictions'], record_path='distances', meta='predicted_label', record_prefix='dist_')

but that will not give me the labels column

1
  • Thank you very much both Henry and Derek O! Commented Jun 12, 2021 at 20:49

2 Answers 2

2

I am assuming result takes the following format:

result = {"predictions": [{"predicted_label": 4.0,"distances": [3.11792408, 3.89746071, 6.32548437],"labels": [0.0, 1.0, 0.0]},{"predicted_label": 2.0,"distances": [1.08470316, 3.04917915, 5.25393973],"labels": [2.0, 2.0, 0.0]}]}

If you pass results['prediction'] to pd.DataFrame, you will get some rows that are lists because "predicted_label" is length 1, while "distances" and "labels" are length 3:

>>> pd.DataFrame(result['predictions'])
   predicted_label                             distances           labels
0              4.0  [3.11792408, 3.89746071, 6.32548437]  [0.0, 1.0, 0.0]
1              2.0  [1.08470316, 3.04917915, 5.25393973]  [2.0, 2.0, 0.0]

To get around this, we can then set predicted_label to be the index, then apply pd.Series.explode to the other columns (credit goes to @yatu's answer here), before resetting the index. Since they are lists, they are of type dobject, so we can use applymap to change everything to type float.

Set the formatting to 8 digits after the decimal: pd.options.display.float_format = "{:.8f}".format

>>> pd.DataFrame(result['predictions']).set_index('predicted_label').apply(pd.Series.explode).reset_index().applymap(lambda x: float(x))

   predicted_label  distances     labels
0       4.00000000 3.11792408 0.00000000
1       4.00000000 3.89746071 1.00000000
2       4.00000000 6.32548437 0.00000000
3       2.00000000 1.08470316 2.00000000
4       2.00000000 3.04917915 2.00000000
5       2.00000000 5.25393973 0.00000000
Sign up to request clarification or add additional context in comments.

5 Comments

FWIW the set_index part is unnecessary explode handles "unexplodeable" elements quite well pd.DataFrame(result['predictions']).apply(pd.Series.explode).reset_index(drop=True)
one problem with the answer is that the returned numbers are now objects. they also have fewer digits. 3.11792408 vs 3.117924. that prevents the next "join" step that i am trying to accomplish
That's quite peculiar - let me look into this and see if casting as a string then back to float might be a workaround
The number of digits are not actually fewer: you can control the formatting with pd.options.display.float_format = "{:.8f}".format which will display all 8 digits after the decimal, for example
I updated the answer - hopefully this helps!
1

the response seems like a bunch of records you could parse them one by one, then concat it together:

df = []
for dd in response['predictions']:
    df.append(pd.DataFrame(dd))
df = pd.concat(df).reset_index(drop=True) # reset_index if needed.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.