Converting JSON file to Pandas dataframe

Question

I have a json file which I converted to dict like below:

{'DATA': [{'COMPANY_SCHEMA': 'ABC', 'CONFIG_TYPE': 'rtype', 'IM_ID': '44f8d1b4_437e', 'MODIFIED_DATE': 'Unknown', 'ID': 'Test', 'CONFIG_KEY': 'posting_f', 'SYSTEM_NUMBER': '50', 'SYS_CONFIG_VALUE': '0', 'SYS_CONFIG_STRING_VALUE': 'true'}

I wrote the following code to convert a json file to above dict format

with open('data.json') as data_file: 
    data = json.load(data_file)

Now I am trying to store this dict as pandas data frame with keys as column headers.

So I write below:

df=pd.DataFrame.from_dict(data,orient='columns')

But I get all columns as one column.

df.head(3)

    DATA
0   {'COMPANY_SCHEMA': 'ABC.', 'CON...
1   {'COMPANY_SCHEMA': 'ABC', 'CON...
2   {'COMPANY_SCHEMA': 'ABC', 'CON...

I basically have a bunch of such json files in a folder and I am trying to read all of them and store in one pandas data frame appended one below the other.

So I was trying above. So

1) why the above error when converting to pandas data frame and

ii) Is there a better and faster way to read a bunch of such files and append to one json and then add it to pandas frame or one by one?

CandleWax · Accepted Answer · 2018-05-07 21:23:02Z

1

Not sure about why you are getting the error you show, but I would skip converting the json to a dictionary and just use pd.read_json() instead.

https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_json.html

answered May 7, 2018 at 21:23

CandleWax

2,2392 gold badges32 silver badges55 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Bowen Liu Over a year ago

I get an Expected String or Unicode error from that.

Peque · Accepted Answer · 2018-05-07 22:47:18Z

1

The data you provide is broken, so it is hard to reproduce. Try to provide a reproducible case when asking! ;-)

Anyway I guess you just need to:

df = pandas.DataFrame(data['DATA'])

Where data is the dictionary you created with json.load().

A pandas.DataFrame() can be initialized with a list of dictionaries with no problem, but you need to pass the list of dictionaries.

If you are concerned about performance then yeah, append to your list of dictionaries first and convert the whole list to a DataFrame with pandas.DataFrame(list_of_dictionaries).

answered May 7, 2018 at 22:47

Peque

15.1k11 gold badges75 silver badges117 bronze badges

3 Comments

Baktaawar Over a year ago

this is helpful. Do we have append function for dict so that dict overlapping keys are added and not get overwritten?

Baktaawar Over a year ago

Tis doesn't append. It overlaps. The result I want is d={'a':[1,2],'b':[2,0],'c':1}

Peque Over a year ago

@Baktaawar That would be a different question, not even related to Pandas. ;-) Although if you are going to create a DataFrame then you probably want to simply append new dictionaries to a list of dictionaries, as I mentioned.

Community · Accepted Answer · 2021-10-07 12:12:37Z

Your data is broken. After analyzing your question, I construct one like following:

{'DATA': [{'COMPANY_SCHEMA': 'ABC', 'CONFIG_TYPE': 'rtype', 'IM_ID': '44f8d1b4_437e', 'MODIFIED_DATE': 'Unknown', 'ID': 'Test', 'CONFIG_KEY': 'posting_f', 'SYSTEM_NUMBER': '50', 'SYS_CONFIG_VALUE': '0', 'SYS_CONFIG_STRING_VALUE': 'true'}, {'COMPANY_SCHEMA': 'ABC', 'CONFIG_TYPE': 'rtype', 'IM_ID': '44f8d1b4_437e', 'MODIFIED_DATE': 'Unknown', 'ID': 'Test', 'CONFIG_KEY': 'posting_f', 'SYSTEM_NUMBER': '50', 'SYS_CONFIG_VALUE': '0', 'SYS_CONFIG_STRING_VALUE': 'true'}]}

Since you only give the converted dict and JSON specification - RFC7159 states that a string begins and ends with quotation mark which is ". I just take the dict as an example.

I use ast.literal_eval() to safely get a data structure from a string, which is dict same with your json.load(). After getting a dict object, there are various ways to convert it to dataframe.

import ast
import pandas as pd


with open('data.dict') as data_file:
    dict_data = ast.literal_eval(data_file.read())

# The following methods all produce the same output:
pd.DataFrame(dict_data['DATA'])
pd.DataFrame.from_dict(dict_data['DATA'])
pd.DataFrame.from_records(dict_data['DATA'])

# print(pd.DataFrame(dict_data['DATA']))
  COMPANY_SCHEMA CONFIG_TYPE          IM_ID MODIFIED_DATE    ID CONFIG_KEY SYSTEM_NUMBER SYS_CONFIG_VALUE SYS_CONFIG_STRING_VALUE
0            ABC       rtype  44f8d1b4_437e       Unknown  Test  posting_f            50                0                    true
1            ABC       rtype  44f8d1b4_437e       Unknown  Test  posting_f            50                0                    true

why the above error when converting to pandas data frame

If you mean why there's only one column, that's pandas.DataFrame.from_dict() treats the keys of the dict as the DataFrame columns by default. If you do df=pd.DataFrame.from_dict(data), whose key is DATA. So there is only one column named DATA

ii) Is there a better and faster way to read a bunch of such files and append to one json and then add it to pandas frame or one by one?

My solution is to concat all the dict data to one list:

with open('data1.json') as data_file: 
    dict_data1 = json.load(data_file)

....

data = dict_data1['DATA'] + dict_data2['DATA']

# Convert to pandas dataframe
pd.DataFrame(data)

# Dump the data to json file
with open('result.json', 'w') as fp:
    json.dump({'DATA': data}, fp)

You could use a for loop to simplify the procedure.

Collectives™ on Stack Overflow

Converting JSON file to Pandas dataframe

3 Answers 3

1 Comment

3 Comments

Comments

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

3 Comments

Comments

Related