parsing json in pandas dataframe

Question

I need to parse this response into a pandas dataframe.

  "predictions": [
    {
        "predicted_label": 4.0,
        "distances": [3.11792408, 3.89746071, 6.32548437],
        "labels": [0.0, 1.0, 0.0]
    },
    {
        "predicted_label": 2.0,
        "distances": [1.08470316, 3.04917915, 5.25393973],
        "labels": [2.0, 2.0, 0.0]
    }
  ]

The end result i am looking for is:

	predicted_label	distances	labels
0	4.0	3.11792408	0.0.
1	4.0	3.89746071	1.0
2.	4.0.	6.32548437	0.0

same for the second predicted_label 2.0.

I tried using:

pd.json_normalize(result['predictions'], record_path='distances', meta='predicted_label', record_prefix='dist_')

but that will not give me the labels column

Thank you very much both Henry and Derek O!

PRAEMERE LLC
– PRAEMERE LLC

2021-06-12 20:49:01 +00:00
Commented Jun 12, 2021 at 20:49 — PRAEMERE LLC
– PRAEMERE LLC, Commented Jun 12, 2021 at 20:49

Derek O · Accepted Answer · 2021-06-12 21:29:29Z

I am assuming result takes the following format:

result = {"predictions": [{"predicted_label": 4.0,"distances": [3.11792408, 3.89746071, 6.32548437],"labels": [0.0, 1.0, 0.0]},{"predicted_label": 2.0,"distances": [1.08470316, 3.04917915, 5.25393973],"labels": [2.0, 2.0, 0.0]}]}

If you pass results['prediction'] to pd.DataFrame, you will get some rows that are lists because "predicted_label" is length 1, while "distances" and "labels" are length 3:

>>> pd.DataFrame(result['predictions'])
   predicted_label                             distances           labels
0              4.0  [3.11792408, 3.89746071, 6.32548437]  [0.0, 1.0, 0.0]
1              2.0  [1.08470316, 3.04917915, 5.25393973]  [2.0, 2.0, 0.0]

To get around this, we can then set predicted_label to be the index, then apply pd.Series.explode to the other columns (credit goes to @yatu's answer here), before resetting the index. Since they are lists, they are of type dobject, so we can use applymap to change everything to type float.

Set the formatting to 8 digits after the decimal: pd.options.display.float_format = "{:.8f}".format

>>> pd.DataFrame(result['predictions']).set_index('predicted_label').apply(pd.Series.explode).reset_index().applymap(lambda x: float(x))

   predicted_label  distances     labels
0       4.00000000 3.11792408 0.00000000
1       4.00000000 3.89746071 1.00000000
2       4.00000000 6.32548437 0.00000000
3       2.00000000 1.08470316 2.00000000
4       2.00000000 3.04917915 2.00000000
5       2.00000000 5.25393973 0.00000000

FWIW the set_index part is unnecessary explode handles "unexplodeable" elements quite well pd.DataFrame(result['predictions']).apply(pd.Series.explode).reset_index(drop=True)
one problem with the answer is that the returned numbers are now objects. they also have fewer digits. 3.11792408 vs 3.117924. that prevents the next "join" step that i am trying to accomplish
That's quite peculiar - let me look into this and see if casting as a string then back to float might be a workaround
The number of digits are not actually fewer: you can control the formatting with pd.options.display.float_format = "{:.8f}".format which will display all 8 digits after the decimal, for example

Derek O · Accepted Answer · 2021-06-12 20:42:54Z

1

the response seems like a bunch of records you could parse them one by one, then concat it together:

df = []
for dd in response['predictions']:
    df.append(pd.DataFrame(dd))
df = pd.concat(df).reset_index(drop=True) # reset_index if needed.

edited Jun 12, 2021 at 20:42

Derek O

20.2k4 gold badges32 silver badges49 bronze badges

answered Jun 12, 2021 at 20:01

SCKU

83310 silver badges15 bronze badges

Collectives™ on Stack Overflow

parsing json in pandas dataframe

2 Answers 2

5 Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

5 Comments

Comments

Linked

Related