I have below URL that has a JSON response. I need to read this json into a pandas dataframe and perform operations on top of it . This is a case of nested JSON which consists of multiple lists and dicts within dicts.
URL: 'http://api.nobelprize.org/v1/laureate.json'
I have tried below code:
import json, pandas as pd,requests
resp=requests.get('http://api.nobelprize.org/v1/laureate.json')
df=pd.json_normalize(json.loads(resp.content),record_path =['laureates'])
print(df.head(5))
Output-
id firstname surname born died \
0 1 Wilhelm Conrad Röntgen 1845-03-27 1923-02-10
1 2 Hendrik A. Lorentz 1853-07-18 1928-02-04
2 3 Pieter Zeeman 1865-05-25 1943-10-09
3 4 Henri Becquerel 1852-12-15 1908-08-25
4 5 Pierre Curie 1859-05-15 1906-04-19
bornCountry bornCountryCode bornCity \
0 Prussia (now Germany) DE Lennep (now Remscheid)
1 the Netherlands NL Arnhem
2 the Netherlands NL Zonnemaire
3 France FR Paris
4 France FR Paris
diedCountry diedCountryCode diedCity gender \
0 Germany DE Munich male
1 the Netherlands NL NaN male
2 the Netherlands NL Amsterdam male
3 France FR NaN male
4 France FR Paris male
prizes
0 [{'year': '1901', 'category': 'physics', 'shar...
1 [{'year': '1902', 'category': 'physics', 'shar...
2 [{'year': '1902', 'category': 'physics', 'shar...
3 [{'year': '1903', 'category': 'physics', 'shar...
4 [{'year': '1903', 'category': 'physics', 'shar...
But in this prizes comes as a list. If I create a separate dataframe for prizes, it has affiliations as list.I want all columns to come as separate columns. Some entires may/may not have prizes. So need to handle that case as well.
I went through this article https://towardsdatascience.com/all-pandas-json-normalize-you-should-know-for-flattening-json-13eae1dfb7dd. Looks like we'll have to use meta and error=ignore here, but not able to fix it. Appreciate your inputs here. Thanks.