Extract key and value from json to new dataframe

Question

I have a dataframe that has JSON values are in columns. Those were indented into multiple levels. I would like to extract the end key and value into a new dataframe. I will give you sample column values below

{'shipping_assignments': [{'shipping': {'address': {'address_type': 'shipping', 'city': 'Calder', 'country_id': 'US', 'customer_address_id': 1, 'email': '[email protected]', 'entity_id': 1, 'firstname': 'Veronica', 'lastname': 'Costello', 'parent_id': 1, 'postcode': '49628-7978', 'region': 'Michigan', 'region_code': 'MI', 'region_id': 33, 'street': ['6146 Honey Bluff Parkway'], 'telephone': '(555) 229-3326'}, 'method': 'flatrate_flatrate', 'total': {'base_shipping_amount': 5, 'base_shipping_discount_amount': 0, 'base_shipping_discount_tax_compensation_amnt': 0, 'base_shipping_incl_tax': 5, 'base_shipping_invoiced': 5, 'base_shipping_tax_amount': 0, 'shipping_amount': 5, 'shipping_discount_amount': 0, 'shipping_discount_tax_compensation_amount': 0, 'shipping_incl_tax': 5, 'shipping_invoiced': 5, 'shipping_tax_amount': 0}}, 'items': [{'amount_refunded': 0, 'applied_rule_ids': '1', 'base_amount_refunded': 0, 'base_discount_amount': 0, 'base_discount_invoiced': 0, 'base_discount_tax_compensation_amount': 0, 'base_discount_tax_compensation_invoiced': 0, 'base_original_price': 29, 'base_price': 29, 'base_price_incl_tax': 31.39, 'base_row_invoiced': 29, 'base_row_total': 29, 'base_row_total_incl_tax': 31.39, 'base_tax_amount': 2.39, 'base_tax_invoiced': 2.39, 'created_at': '2019-09-27 10:03:45', 'discount_amount': 0, 'discount_invoiced': 0, 'discount_percent': 0, 'free_shipping': 0, 'discount_tax_compensation_amount': 0, 'discount_tax_compensation_invoiced': 0, 'is_qty_decimal': 0, 'item_id': 1, 'name': 'Iris Workout Top', 'no_discount': 0, 'order_id': 1, 'original_price': 29, 'price': 29, 'price_incl_tax': 31.39, 'product_id': 1434, 'product_type': 'configurable', 'qty_canceled': 0, 'qty_invoiced': 1, 'qty_ordered': 1, 'qty_refunded': 0, 'qty_shipped': 1, 'row_invoiced': 29, 'row_total': 29, 'row_total_incl_tax': 31.39, 'row_weight': 1, 'sku': 'WS03-XS-Red', 'store_id': 1, 'tax_amount': 2.39, 'tax_invoiced': 2.39, 'tax_percent': 8.25, 'updated_at': '2019-09-27 10:03:46', 'weight': 1, 'product_option': {'extension_attributes': {'configurable_item_options': [{'option_id': '141', 'option_value': 167}, {'option_id': '93', 'option_value': 58}]}}}]}], 'payment_additional_info': [{'key': 'method_title', 'value': 'Check / Money order'}], 'applied_taxes': [{'code': 'US-MI--Rate 1', 'title': 'US-MI--Rate 1', 'percent': 8.25, 'amount': 2.39, 'base_amount': 2.39}], 'item_applied_taxes': [{'type': 'product', 'applied_taxes': [{'code': 'US-MI--Rate 1', 'title': 'US-MI--Rate 1', 'percent': 8.25, 'amount': 2.39, 'base_amount': 2.39}]}], 'converting_from_quote': True}

Above is single row value of the dataframe column df['x']

My codes are below to convert

sample = data['x'].tolist()
data = json.dumps(sample)
df = pd.read_json(data)

it gives new dataframe with columns

Index(['applied_taxes', 'converting_from_quote', 'item_applied_taxes', 'payment_additional_info', 'shipping_assignments'], dtype='object')

When I tried to do the same above to convert the column which has row values

m_df = df['applied_taxes'].apply(lambda x : re.sub('.?\[|$.|]',"", str(x)))
m_sample = m_df.tolist()
m_data = json.dumps(m_sample)
c_df = pd.read_json(m_data)

It doesn't work

Check this link to get the beautified_json

Vijayaraghavan · Accepted Answer · 2019-10-21 05:13:44Z

2

I came across a beautiful ETL package in python called petl. convert the json list into dict form with the help of function called fromdicts(json_string)

order_table = fromdicts(data_list)

If you find any nested dict in any of the columns, use unpackdict(order_table,'nested_col') it will unpack the nested dict. In my case, I need to unpack the applied_tax column. Below code will unpack and append the key and value as a column and row in the same table.

order_table  = unpackdict(order_table, 'applied_taxes')

If you guys wants to know more about -petl

edited Oct 21, 2019 at 5:13

answered Oct 17, 2019 at 11:34

Vijayaraghavan

2551 gold badge2 silver badges14 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Kostas Charitidis · Accepted Answer · 2019-10-16 12:23:10Z

It seems that your mistake was in tolist(). Try the following:

import pandas as pd
import json
import re

data = {"shipping_assignments":[{"shipping":{"address":{"address_type":"shipping","city":"Calder","country_id":"US","customer_address_id":1,"email":"[email protected]","entity_id":1,"firstname":"Veronica","lastname":"Costello","parent_id":1,"postcode":"49628-7978","region":"Michigan","region_code":"MI","region_id":33,"street":["6146 Honey Bluff Parkway"],"telephone":"(555) 229-3326"},"method":"flatrate_flatrate","total":{"base_shipping_amount":5,"base_shipping_discount_amount":0,"base_shipping_discount_tax_compensation_amnt":0,"base_shipping_incl_tax":5,"base_shipping_invoiced":5,"base_shipping_tax_amount":0,"shipping_amount":5,"shipping_discount_amount":0,"shipping_discount_tax_compensation_amount":0,"shipping_incl_tax":5,"shipping_invoiced":5,"shipping_tax_amount":0}},"items":[{"amount_refunded":0,"applied_rule_ids":"1","base_amount_refunded":0,"base_discount_amount":0,"base_discount_invoiced":0,"base_discount_tax_compensation_amount":0,"base_discount_tax_compensation_invoiced":0,"base_original_price":29,"base_price":29,"base_price_incl_tax":31.39,"base_row_invoiced":29,"base_row_total":29,"base_row_total_incl_tax":31.39,"base_tax_amount":2.39,"base_tax_invoiced":2.39,"created_at":"2019-09-27 10:03:45","discount_amount":0,"discount_invoiced":0,"discount_percent":0,"free_shipping":0,"discount_tax_compensation_amount":0,"discount_tax_compensation_invoiced":0,"is_qty_decimal":0,"item_id":1,"name":"Iris Workout Top","no_discount":0,"order_id":1,"original_price":29,"price":29,"price_incl_tax":31.39,"product_id":1434,"product_type":"configurable","qty_canceled":0,"qty_invoiced":1,"qty_ordered":1,"qty_refunded":0,"qty_shipped":1,"row_invoiced":29,"row_total":29,"row_total_incl_tax":31.39,"row_weight":1,"sku":"WS03-XS-Red","store_id":1,"tax_amount":2.39,"tax_invoiced":2.39,"tax_percent":8.25,"updated_at":"2019-09-27 10:03:46","weight":1,"product_option":{"extension_attributes":{"configurable_item_options":[{"option_id":"141","option_value":167},{"option_id":"93","option_value":58}]}}}]}],"payment_additional_info":[{"key":"method_title","value":"Check / Money order"}],"applied_taxes":[{"code":"US-MI-*-Rate 1","title":"US-MI-*-Rate 1","percent":8.25,"amount":2.39,"base_amount":2.39}],"item_applied_taxes":[{"type":"product","applied_taxes":[{"code":"US-MI-*-Rate 1","title":"US-MI-*-Rate 1","percent":8.25,"amount":2.39,"base_amount":2.39}]}],"converting_from_quote":"True"}

df = pd.read_json(json.dumps(data))
m_df = df['applied_taxes'].apply(lambda x : re.sub('.?\[|$.|]',"", str(x)))
c_df = pd.read_json(json.dumps(list(m_df)))
print(c_df)

prints the following:

                                                   0
0  {'code': 'US-MI-*-Rate 1', 'title': 'US-MI-*-R...

we need the end key and value of the rows as a new dataframe. For example, Code as column and its value as row and so on

Collectives™ on Stack Overflow

Extract key and value from json to new dataframe

2 Answers 2

Comments

1 Comment

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

1 Comment

Related