How to split a nested column into a few new columns in Python?

Question

In Python, I'm working with a dataset to determine how reactions of users are related to the post reach. My dataset is structured in this way, the Reactions column being nested:

   PostID    Reach    Reaction
   01        787767   {"like":49852,"wow":8017,"haha":3200,"anger":3}
   02        973183   {"like":57911,"wow":3013,"haha":8017,"anger":15}
   03        ...      ...

I want to restructure the data and create separate reaction columns so the dataframe would be looking like that:

   PostID    Reach    like     wow     haha     anger
   01        787767   49852    8017    3200     3
   02        973183   57911    3013    8017     15
   03        ...      ...

Is that a column of JSON or dicts?

cs95
– cs95

2018-01-14 22:27:01 +00:00
Commented Jan 14, 2018 at 22:27 — cs95
– cs95, Commented Jan 14, 2018 at 22:27
The column is json, thx for the answers. Both worked!

Lina Linutina
– Lina Linutina

2018-01-15 00:36:52 +00:00
Commented Jan 15, 2018 at 0:36 — Lina Linutina
– Lina Linutina, Commented Jan 15, 2018 at 0:36

DYZ · Accepted Answer · 2018-01-14 22:35:31Z

5

Convert the dictionaries to Pandas Series:

pd.concat([df.iloc[:,:2], df.Reaction.apply(pd.Series)],axis=1)
#   PostID   Reach  anger  haha   like   wow
#0       1  787767      3  3200  49852  8017
#1       2   97318     15  8017  57911  3013

edited Jan 14, 2018 at 22:35

answered Jan 14, 2018 at 22:34

DYZ

57.3k10 gold badges73 silver badges101 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

cs95 Over a year ago

apply(pd.Series) is slow, and won't work if it's JSON data.

DYZ Over a year ago

@cᴏʟᴅsᴘᴇᴇᴅ Why do you think it's JSON data? Any JSON would become a dict before it becomes a part of a DataFrame.

DYZ Over a year ago

@cᴏʟᴅsᴘᴇᴇᴅ What is JSON data in Python?

cs95 Over a year ago

Also, no guarantee that df.iloc[:,:2] should work for OP's actual data, drop would be better here.

cs95 Over a year ago

I'm sorry, I wasn't clear. What I meant to say is that I have reason to believe that it might be a column of strings, is all. Anyway, my answer caters to both situations.

cs95 · Accepted Answer · 2018-01-14 22:32:23Z

Lots of ways to do this, assuming you have a column of JSON data. One simple way is applying a json.loads operation, converting the string to dicts, and then using DataFrame.from_records, or json_normalize to load it in.

v = pd.DataFrame.from_records(df.Reaction.apply(pd.json.loads))

Or,

v = pd.io.json.json_normalize(df.Reaction.apply(pd.json.loads).tolist())

Finally, concatenate the result.

pd.concat([df.drop('Reaction', 1), v], axis=1)

   PostID   Reach  anger  haha   like   wow
0       1  787767      3  3200  49852  8017
1       2  973183     15  8017  57911  3013

On the other hand, if you have a column of dictionaries, then this should be faster -

v = pd.DataFrame.from_records(df.Reaction) 
pd.concat([df.drop('Reaction', 1), v], axis=1)

   PostID   Reach  anger  haha   like   wow
0       1  787767      3  3200  49852  8017
1       2  973183     15  8017  57911  3013

Collectives™ on Stack Overflow

How to split a nested column into a few new columns in Python?

2 Answers 2

5 Comments

Comments

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

5 Comments

Comments

Related