0

I have a json file which I converted to dict like below:

{'DATA': [{'COMPANY_SCHEMA': 'ABC', 'CONFIG_TYPE': 'rtype', 'IM_ID': '44f8d1b4_437e', 'MODIFIED_DATE': 'Unknown', 'ID': 'Test', 'CONFIG_KEY': 'posting_f', 'SYSTEM_NUMBER': '50', 'SYS_CONFIG_VALUE': '0', 'SYS_CONFIG_STRING_VALUE': 'true'}

I wrote the following code to convert a json file to above dict format

with open('data.json') as data_file: 
    data = json.load(data_file)

Now I am trying to store this dict as pandas data frame with keys as column headers.

So I write below:

df=pd.DataFrame.from_dict(data,orient='columns')

But I get all columns as one column.

df.head(3)

    DATA
0   {'COMPANY_SCHEMA': 'ABC.', 'CON...
1   {'COMPANY_SCHEMA': 'ABC', 'CON...
2   {'COMPANY_SCHEMA': 'ABC', 'CON...

I basically have a bunch of such json files in a folder and I am trying to read all of them and store in one pandas data frame appended one below the other.

So I was trying above. So

1) why the above error when converting to pandas data frame and

ii) Is there a better and faster way to read a bunch of such files and append to one json and then add it to pandas frame or one by one?

3 Answers 3

1

Not sure about why you are getting the error you show, but I would skip converting the json to a dictionary and just use pd.read_json() instead.

https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_json.html

Sign up to request clarification or add additional context in comments.

1 Comment

I get an Expected String or Unicode error from that.
1

The data you provide is broken, so it is hard to reproduce. Try to provide a reproducible case when asking! ;-)

Anyway I guess you just need to:

df = pandas.DataFrame(data['DATA'])

Where data is the dictionary you created with json.load().

A pandas.DataFrame() can be initialized with a list of dictionaries with no problem, but you need to pass the list of dictionaries.

If you are concerned about performance then yeah, append to your list of dictionaries first and convert the whole list to a DataFrame with pandas.DataFrame(list_of_dictionaries).

3 Comments

this is helpful. Do we have append function for dict so that dict overlapping keys are added and not get overwritten?
Tis doesn't append. It overlaps. The result I want is d={'a':[1,2],'b':[2,0],'c':1}
@Baktaawar That would be a different question, not even related to Pandas. ;-) Although if you are going to create a DataFrame then you probably want to simply append new dictionaries to a list of dictionaries, as I mentioned.
1

Your data is broken. After analyzing your question, I construct one like following:

{'DATA': [{'COMPANY_SCHEMA': 'ABC', 'CONFIG_TYPE': 'rtype', 'IM_ID': '44f8d1b4_437e', 'MODIFIED_DATE': 'Unknown', 'ID': 'Test', 'CONFIG_KEY': 'posting_f', 'SYSTEM_NUMBER': '50', 'SYS_CONFIG_VALUE': '0', 'SYS_CONFIG_STRING_VALUE': 'true'}, {'COMPANY_SCHEMA': 'ABC', 'CONFIG_TYPE': 'rtype', 'IM_ID': '44f8d1b4_437e', 'MODIFIED_DATE': 'Unknown', 'ID': 'Test', 'CONFIG_KEY': 'posting_f', 'SYSTEM_NUMBER': '50', 'SYS_CONFIG_VALUE': '0', 'SYS_CONFIG_STRING_VALUE': 'true'}]}

Since you only give the converted dict and JSON specification - RFC7159 states that a string begins and ends with quotation mark which is ". I just take the dict as an example.

I use ast.literal_eval() to safely get a data structure from a string, which is dict same with your json.load(). After getting a dict object, there are various ways to convert it to dataframe.

import ast
import pandas as pd


with open('data.dict') as data_file:
    dict_data = ast.literal_eval(data_file.read())

# The following methods all produce the same output:
pd.DataFrame(dict_data['DATA'])
pd.DataFrame.from_dict(dict_data['DATA'])
pd.DataFrame.from_records(dict_data['DATA'])
# print(pd.DataFrame(dict_data['DATA']))
  COMPANY_SCHEMA CONFIG_TYPE          IM_ID MODIFIED_DATE    ID CONFIG_KEY SYSTEM_NUMBER SYS_CONFIG_VALUE SYS_CONFIG_STRING_VALUE
0            ABC       rtype  44f8d1b4_437e       Unknown  Test  posting_f            50                0                    true
1            ABC       rtype  44f8d1b4_437e       Unknown  Test  posting_f            50                0                    true
  1. why the above error when converting to pandas data frame

If you mean why there's only one column, that's pandas.DataFrame.from_dict() treats the keys of the dict as the DataFrame columns by default. If you do df=pd.DataFrame.from_dict(data), whose key is DATA. So there is only one column named DATA

ii) Is there a better and faster way to read a bunch of such files and append to one json and then add it to pandas frame or one by one?

My solution is to concat all the dict data to one list:

with open('data1.json') as data_file: 
    dict_data1 = json.load(data_file)

....

data = dict_data1['DATA'] + dict_data2['DATA']

# Convert to pandas dataframe
pd.DataFrame(data)

# Dump the data to json file
with open('result.json', 'w') as fp:
    json.dump({'DATA': data}, fp)

You could use a for loop to simplify the procedure.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.