0

In Python, I'm working with a dataset to determine how reactions of users are related to the post reach. My dataset is structured in this way, the Reactions column being nested:

   PostID    Reach    Reaction
   01        787767   {"like":49852,"wow":8017,"haha":3200,"anger":3}
   02        973183   {"like":57911,"wow":3013,"haha":8017,"anger":15}
   03        ...      ...

I want to restructure the data and create separate reaction columns so the dataframe would be looking like that:

   PostID    Reach    like     wow     haha     anger
   01        787767   49852    8017    3200     3
   02        973183   57911    3013    8017     15
   03        ...      ...
2
  • Is that a column of JSON or dicts? Commented Jan 14, 2018 at 22:27
  • The column is json, thx for the answers. Both worked! Commented Jan 15, 2018 at 0:36

2 Answers 2

5

Convert the dictionaries to Pandas Series:

pd.concat([df.iloc[:,:2], df.Reaction.apply(pd.Series)],axis=1)
#   PostID   Reach  anger  haha   like   wow
#0       1  787767      3  3200  49852  8017
#1       2   97318     15  8017  57911  3013
Sign up to request clarification or add additional context in comments.

5 Comments

apply(pd.Series) is slow, and won't work if it's JSON data.
@cᴏʟᴅsᴘᴇᴇᴅ Why do you think it's JSON data? Any JSON would become a dict before it becomes a part of a DataFrame.
@cᴏʟᴅsᴘᴇᴇᴅ What is JSON data in Python?
Also, no guarantee that df.iloc[:,:2] should work for OP's actual data, drop would be better here.
I'm sorry, I wasn't clear. What I meant to say is that I have reason to believe that it might be a column of strings, is all. Anyway, my answer caters to both situations.
2

Lots of ways to do this, assuming you have a column of JSON data. One simple way is applying a json.loads operation, converting the string to dicts, and then using DataFrame.from_records, or json_normalize to load it in.

v = pd.DataFrame.from_records(df.Reaction.apply(pd.json.loads))

Or,

v = pd.io.json.json_normalize(df.Reaction.apply(pd.json.loads).tolist())

Finally, concatenate the result.

pd.concat([df.drop('Reaction', 1), v], axis=1)

   PostID   Reach  anger  haha   like   wow
0       1  787767      3  3200  49852  8017
1       2  973183     15  8017  57911  3013

On the other hand, if you have a column of dictionaries, then this should be faster -

v = pd.DataFrame.from_records(df.Reaction) 
pd.concat([df.drop('Reaction', 1), v], axis=1)

   PostID   Reach  anger  haha   like   wow
0       1  787767      3  3200  49852  8017
1       2  973183     15  8017  57911  3013

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.