0

I have a dataframe that has JSON values are in columns. Those were indented into multiple levels. I would like to extract the end key and value into a new dataframe. I will give you sample column values below

{'shipping_assignments': [{'shipping': {'address': {'address_type': 'shipping', 'city': 'Calder', 'country_id': 'US', 'customer_address_id': 1, 'email': '[email protected]', 'entity_id': 1, 'firstname': 'Veronica', 'lastname': 'Costello', 'parent_id': 1, 'postcode': '49628-7978', 'region': 'Michigan', 'region_code': 'MI', 'region_id': 33, 'street': ['6146 Honey Bluff Parkway'], 'telephone': '(555) 229-3326'}, 'method': 'flatrate_flatrate', 'total': {'base_shipping_amount': 5, 'base_shipping_discount_amount': 0, 'base_shipping_discount_tax_compensation_amnt': 0, 'base_shipping_incl_tax': 5, 'base_shipping_invoiced': 5, 'base_shipping_tax_amount': 0, 'shipping_amount': 5, 'shipping_discount_amount': 0, 'shipping_discount_tax_compensation_amount': 0, 'shipping_incl_tax': 5, 'shipping_invoiced': 5, 'shipping_tax_amount': 0}}, 'items': [{'amount_refunded': 0, 'applied_rule_ids': '1', 'base_amount_refunded': 0, 'base_discount_amount': 0, 'base_discount_invoiced': 0, 'base_discount_tax_compensation_amount': 0, 'base_discount_tax_compensation_invoiced': 0, 'base_original_price': 29, 'base_price': 29, 'base_price_incl_tax': 31.39, 'base_row_invoiced': 29, 'base_row_total': 29, 'base_row_total_incl_tax': 31.39, 'base_tax_amount': 2.39, 'base_tax_invoiced': 2.39, 'created_at': '2019-09-27 10:03:45', 'discount_amount': 0, 'discount_invoiced': 0, 'discount_percent': 0, 'free_shipping': 0, 'discount_tax_compensation_amount': 0, 'discount_tax_compensation_invoiced': 0, 'is_qty_decimal': 0, 'item_id': 1, 'name': 'Iris Workout Top', 'no_discount': 0, 'order_id': 1, 'original_price': 29, 'price': 29, 'price_incl_tax': 31.39, 'product_id': 1434, 'product_type': 'configurable', 'qty_canceled': 0, 'qty_invoiced': 1, 'qty_ordered': 1, 'qty_refunded': 0, 'qty_shipped': 1, 'row_invoiced': 29, 'row_total': 29, 'row_total_incl_tax': 31.39, 'row_weight': 1, 'sku': 'WS03-XS-Red', 'store_id': 1, 'tax_amount': 2.39, 'tax_invoiced': 2.39, 'tax_percent': 8.25, 'updated_at': '2019-09-27 10:03:46', 'weight': 1, 'product_option': {'extension_attributes': {'configurable_item_options': [{'option_id': '141', 'option_value': 167}, {'option_id': '93', 'option_value': 58}]}}}]}], 'payment_additional_info': [{'key': 'method_title', 'value': 'Check / Money order'}], 'applied_taxes': [{'code': 'US-MI--Rate 1', 'title': 'US-MI--Rate 1', 'percent': 8.25, 'amount': 2.39, 'base_amount': 2.39}], 'item_applied_taxes': [{'type': 'product', 'applied_taxes': [{'code': 'US-MI--Rate 1', 'title': 'US-MI--Rate 1', 'percent': 8.25, 'amount': 2.39, 'base_amount': 2.39}]}], 'converting_from_quote': True}

Above is single row value of the dataframe column df['x']

My codes are below to convert

sample = data['x'].tolist()
data = json.dumps(sample)
df = pd.read_json(data)

it gives new dataframe with columns

Index(['applied_taxes', 'converting_from_quote', 'item_applied_taxes', 'payment_additional_info', 'shipping_assignments'], dtype='object')

When I tried to do the same above to convert the column which has row values

m_df = df['applied_taxes'].apply(lambda x : re.sub('.?\[|$.|]',"", str(x)))
m_sample = m_df.tolist()
m_data = json.dumps(m_sample)
c_df = pd.read_json(m_data)

It doesn't work

Check this link to get the beautified_json

2 Answers 2

2

I came across a beautiful ETL package in python called petl. convert the json list into dict form with the help of function called fromdicts(json_string)

order_table = fromdicts(data_list)

If you find any nested dict in any of the columns, use unpackdict(order_table,'nested_col') it will unpack the nested dict. In my case, I need to unpack the applied_tax column. Below code will unpack and append the key and value as a column and row in the same table.

order_table  = unpackdict(order_table, 'applied_taxes')

If you guys wants to know more about -petl

Sign up to request clarification or add additional context in comments.

Comments

1

It seems that your mistake was in tolist(). Try the following:

import pandas as pd
import json
import re

data = {"shipping_assignments":[{"shipping":{"address":{"address_type":"shipping","city":"Calder","country_id":"US","customer_address_id":1,"email":"[email protected]","entity_id":1,"firstname":"Veronica","lastname":"Costello","parent_id":1,"postcode":"49628-7978","region":"Michigan","region_code":"MI","region_id":33,"street":["6146 Honey Bluff Parkway"],"telephone":"(555) 229-3326"},"method":"flatrate_flatrate","total":{"base_shipping_amount":5,"base_shipping_discount_amount":0,"base_shipping_discount_tax_compensation_amnt":0,"base_shipping_incl_tax":5,"base_shipping_invoiced":5,"base_shipping_tax_amount":0,"shipping_amount":5,"shipping_discount_amount":0,"shipping_discount_tax_compensation_amount":0,"shipping_incl_tax":5,"shipping_invoiced":5,"shipping_tax_amount":0}},"items":[{"amount_refunded":0,"applied_rule_ids":"1","base_amount_refunded":0,"base_discount_amount":0,"base_discount_invoiced":0,"base_discount_tax_compensation_amount":0,"base_discount_tax_compensation_invoiced":0,"base_original_price":29,"base_price":29,"base_price_incl_tax":31.39,"base_row_invoiced":29,"base_row_total":29,"base_row_total_incl_tax":31.39,"base_tax_amount":2.39,"base_tax_invoiced":2.39,"created_at":"2019-09-27 10:03:45","discount_amount":0,"discount_invoiced":0,"discount_percent":0,"free_shipping":0,"discount_tax_compensation_amount":0,"discount_tax_compensation_invoiced":0,"is_qty_decimal":0,"item_id":1,"name":"Iris Workout Top","no_discount":0,"order_id":1,"original_price":29,"price":29,"price_incl_tax":31.39,"product_id":1434,"product_type":"configurable","qty_canceled":0,"qty_invoiced":1,"qty_ordered":1,"qty_refunded":0,"qty_shipped":1,"row_invoiced":29,"row_total":29,"row_total_incl_tax":31.39,"row_weight":1,"sku":"WS03-XS-Red","store_id":1,"tax_amount":2.39,"tax_invoiced":2.39,"tax_percent":8.25,"updated_at":"2019-09-27 10:03:46","weight":1,"product_option":{"extension_attributes":{"configurable_item_options":[{"option_id":"141","option_value":167},{"option_id":"93","option_value":58}]}}}]}],"payment_additional_info":[{"key":"method_title","value":"Check / Money order"}],"applied_taxes":[{"code":"US-MI-*-Rate 1","title":"US-MI-*-Rate 1","percent":8.25,"amount":2.39,"base_amount":2.39}],"item_applied_taxes":[{"type":"product","applied_taxes":[{"code":"US-MI-*-Rate 1","title":"US-MI-*-Rate 1","percent":8.25,"amount":2.39,"base_amount":2.39}]}],"converting_from_quote":"True"}

df = pd.read_json(json.dumps(data))
m_df = df['applied_taxes'].apply(lambda x : re.sub('.?\[|$.|]',"", str(x)))
c_df = pd.read_json(json.dumps(list(m_df)))
print(c_df)

prints the following:

                                                   0
0  {'code': 'US-MI-*-Rate 1', 'title': 'US-MI-*-R...

1 Comment

we need the end key and value of the rows as a new dataframe. For example, Code as column and its value as row and so on

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.