1

I have a dataframe(df) with the following structure.

Input:

ID data
1 {"customerinfo":{zip:834247}}
2 {"score":535,"customerinfo":{zip:834244}}

I wanted to parse the json data as a new columns, such as to get the following:

Expected Output:

ID zipcode     score
1  834247      NULL
2  834244      535

Original Solution:

df['bureauScore'] = df['data'].transform(lambda x: json.loads(x)['score'])
df['zipcode'] = df['data'].transform(lambda x: json.loads(x)['customerinfo']['zipcode'])

Problem:

If a field is missing the code fails, thus I tired to add a get function as shown below, but that fails here with the following error

Attempted Solution

df['bureauScore'] = df['data'].transform(lambda x: json.loads(get(x))['score'])

Error:

NameError: name 'get' is not defined

Any help would be appreciated.

P.S: I know about json_normalize but I have multiple fields, thus wanted this approach

1 Answer 1

1

Use Series.str.get - it return NaN if no match:

#convert to jsons only once to helper Series s
s = df['data'].transform(json.loads)

df['bureauScore'] = s.str.get('score')
df['zipcode'] = s.str.get('customerinfo').str.get('zip')

print (df)

   ID                                         data  bureauScore  zipcode
0   1              {"customerinfo":{"zip":834247}}          NaN   834247
1   2  {"score":535,"customerinfo":{"zip":834244}}        535.0   834244
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.