3

How can I simply separate a JSON column inside pandas:

pd.DataFrame({
    'col1':[1,2], 
    'col2':["{'foo':1, 'bar':2, 'baz':{'foo':2, 'x':1}}",
            "{'foo':3, 'bar':5, 'baz':{'foo':2, 'x':1}}"]})

   col1                                        col2
0     1  {'foo':1, 'bar':2, 'baz':{'foo':2, 'x':1}}
1     2  {'foo':3, 'bar':5, 'baz':{'foo':2, 'x':1}}

into real columns in a simple and python way?

edit

Desired output:

pd.DataFrame({'col1':[1,2], 'foo':[1,3], 'bar':[2,5], 
              'baz_foo':[2,2], 'baz_x':[1,1]})

   col1  foo  bar  baz_foo  baz_x
0     1    1    2        2      1
1     2    3    5        2      1
4
  • Is the inconsistent quoting in your JSON-like col2 actually what you are looking to parse? Instantiating the DataFrame you provide works, but taking the next step using ast.literal_eval doesn't work because that's not a valid dictionary. Haven't tried the json library actually... Commented Dec 3, 2018 at 21:07
  • no. updated the data. Commented Dec 3, 2018 at 21:08
  • 1
    Can you include your desired output as well, just to be clear? Commented Dec 3, 2018 at 21:10
  • added it to the question. Commented Dec 3, 2018 at 21:12

2 Answers 2

5

json_normalize is the right way to tackle nested JSON data.

import ast
from pandas.io.json import json_normalize

v = json_normalize([ast.literal_eval(j) for j in df.pop('col2')], sep='_')
pd.concat([df, v], 1)

   col1  bar  baz_foo  baz_x  foo
0     1    2        2      1    1
1     2    5        2      1    3

Note, you will still have to convert the JSON to a dictionary first.


If you want to handle NaNs in "col2", try using join at the end:

df = pd.DataFrame({
    'col1':[1,2,3], 
    'col2':["{'foo':1, 'bar':2, 'baz':{'foo':2, 'x':1}}",
            "{'foo':3, 'bar':5, 'baz':{'foo':2, 'x':1}}", 
            np.nan]})

v = json_normalize([
    ast.literal_eval(j) for j in df['col2'].dropna()], sep='_'
)
v.index = df.index[df.pop('col2').notna()]

df.join(v, how='left')
   col1  bar  baz_foo  baz_x  foo
0     1  2.0      2.0    1.0  1.0
1     2  5.0      2.0    1.0  3.0
2     3  NaN      NaN    NaN  NaN
Sign up to request clarification or add additional context in comments.

6 Comments

I get a 'ValueError: malformed node or string: <_ast.Name object at 0x1031d1278> ' for your code.
@GeorgHeiler That's because your JSON is malformed. What do you want to do about it?
I mentioned above that this bug was not intentional. Valid JSON should be assumed and fixed the dummy data.
@GeorgHeiler Made a slight edit. Try running my code with the initialisation provided in my post?
Indeed. This is great. However, this does not yet handle None. Would you mind adding it? I know it was lacking in the minimal sample, but would be great for a more complete sample.
|
0

json_normalize changes nested json-like dictionaries into a table. The nesting path is used to create the column names.

import pandas as pd
from pandas.io.json import json_normalize

data = {'col1':[1,2,3], 
        'col2':[{'foo': 1, 'bar': 2, 'baz': {'foo': 2, 'x': 1}},
                {'foo': 3, 'bar': 5, 'baz': {'foo': None, 'x': 1}}]}

pd.DataFrame(data={"col1": data["col1"]})\
  .join(json_normalize(data["col2"]))

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.