parse a json in a column of dataframe

Question

I have a dataframe(df) with the following structure.

Input:

ID data
1 {"customerinfo":{zip:834247}}
2 {"score":535,"customerinfo":{zip:834244}}

I wanted to parse the json data as a new columns, such as to get the following:

Expected Output:

ID zipcode     score
1  834247      NULL
2  834244      535

Original Solution:

df['bureauScore'] = df['data'].transform(lambda x: json.loads(x)['score'])
df['zipcode'] = df['data'].transform(lambda x: json.loads(x)['customerinfo']['zipcode'])

Problem:

If a field is missing the code fails, thus I tired to add a get function as shown below, but that fails here with the following error

Attempted Solution

df['bureauScore'] = df['data'].transform(lambda x: json.loads(get(x))['score'])

Error:

NameError: name 'get' is not defined

Any help would be appreciated.

P.S: I know about json_normalize but I have multiple fields, thus wanted this approach

jezrael · Accepted Answer · 2020-10-26 07:19:10Z

1

Use Series.str.get - it return NaN if no match:

#convert to jsons only once to helper Series s
s = df['data'].transform(json.loads)

df['bureauScore'] = s.str.get('score')
df['zipcode'] = s.str.get('customerinfo').str.get('zip')

print (df)

   ID                                         data  bureauScore  zipcode
0   1              {"customerinfo":{"zip":834247}}          NaN   834247
1   2  {"score":535,"customerinfo":{"zip":834244}}        535.0   834244

answered Oct 26, 2020 at 7:19

jezrael

868k102 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

parse a json in a column of dataframe

1 Answer 1

Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Related