Adding labels and fields to a nested JSON

Question

I have a dataframe:

Name_ID | URL                    | Count | Rating
------------------------------------------------
ABC     | www.example.com/ABC    | 10    | 5
123     | www.example.com/123    | 9     | 4
XYZ     | www.example.com/XYZ    | 5     | 2
ABC111  | www.example.com/ABC111 | 5     | 2
ABC121  | www.example.com/ABC121 | 5     | 2
222     | www.example.com/222    | 5     | 3
abc222  | www.example.com/abc222 | 4     | 2
ABCaaa  | www.example.com/ABCaaa | 4     | 2

I am trying to create a JSON:

{"name": "sampledata",
 "children": [
{
    "name":9,
    "children":[
        {"name":4,
        "children":[
            {"name":"123","size":100}
                    ]
        }
                ]
},
{
    "name":10,
    "children":[
        {"name":5,
        "children":[
            {"name":"ABC","size":100}
                    ]
        }
                ]
},
{
    "name":4,
    "children":[
        {"name":2,
        "children":[
            {"name":"abc222","size":50},
            {"name":"ABCaaa","size":50}
                    ]
        }
                ]
},
{
    "name":5,
    "children":[
        {"name":2,
        "children":[
            {"name":"ABC","size":16},
            {"name":"ABC111","size":16},
            {"name":"ABC121","size":16}
                    ]
        },
        {"name":3,
        "children":[
            {"name":"222","size":50}
                    ]
        }
                ]
}
]
}

In order to do that:

I am trying to add labels such as "name" and "children" to the JSON while creating it.

I tried something like results = [{"name": i, "children": j} for i,j in results.items()]. But it won't label it properly, I believe.
Add another field with the label "size" which I am planning to calculate based on the formula (Rating*Count*10000)/number_of_children_to_the_immediate_parent.

import pandas as pd
from collections import defaultdict
import json

data =[('ABC', 'www.example.com/ABC', 10   , 5), ('123', 'www.example.com/123', 9, 4), ('XYZ', 'www.example.com/XYZ', 5, 2), ('ABC111', 'www.example.com/ABC111', 5, 2), ('ABC121', 'www.example.com/ABC121', 5, 2), ('222', 'www.example.com/222', 5, 3), ('abc222', 'www.example.com/abc222', 4, 2), ('ABCaaa', 'www.example.com/ABCaaa', 4, 2)]

df = pd.DataFrame(data, columns=['Name', 'URL', 'Count', 'Rating'])

gp = df.groupby(['Count'])

dict_json = {"name": "flare"}
children = []

for name, group in gp:
    temp = {}
    temp["name"] = name
    temp["children"] = []

    rgp = group.groupby(['Rating'])
    for n, g in rgp:
        temp2 = {}
        temp2["name"] = n
        temp2["children"] = g.reset_index().T.to_dict().values()
        for t in temp2["children"]:
            t["size"] = (t["Rating"] * t["Count"] * 10000) / len(temp2["children"])
            t["name"] = t["Name"]
            del t["Count"]
            del t["Rating"]
            del t["URL"]
            del t["Name"]
            del t["index"]
        temp["children"].append(temp2)
    children.append(temp)

dict_json["children"] = children

print json.dumps(dict_json, indent=4)

Though the code does print what I need, I am looking for a more efficient and cleaner way to do the same, mainly because the actual dataset might be even more nested and complicated.

I have a hard time understanding how the dataframe relate to the JSON. I understand it's organized by data > count > rating > name_id; but why remove those names and use the meaningless 'name' identifier at each level? — 301_Moved_Permanently
– 301_Moved_Permanently, Commented Dec 22, 2016 at 19:27
Is the pandas starting point important? If so, why isn't it a tag? Also this kind of question might be more attention on SO. I know there's a lot more numpy interest there; that's probably true for pandas as well. — hpaulj
– hpaulj, Commented Dec 24, 2016 at 0:22
Are you aware that df.to_json(....) is available? The default doesn't look like what you want, but some combination of parameters might do the job. — hpaulj
– hpaulj, Commented Dec 24, 2016 at 0:26

hpaulj · Accepted Answer · 2016-12-24 03:52:19Z

2

The rgp loop can be made more compact, and a bit faster, with:

def foo2(rgp):
    alist = []
    for n, g in rgp:
        temp2 = {"name": n}
        values = g.T.to_dict().values()
        n = len(values)
        def size(t): 
            return (t['Rating'] * t['Count'] * 10000) / n
        temp3 = [{'name': t['Name'], 'size': size(t)} for t in values]
        temp2['children'] = temp3
        alist.append(temp2)
    return alist

I don't have enough experience with Pandas to know whether it is possible to improve on the groupby. For example, it would be possible to perform a 2 level grouping with one call - ie. group on 'Count' and within that 'Rating'?

Considering that json is a string version of a dict, and you have a specific dictionary layout in mind, I don't see how you can organize the code in any other way. With the exception of update, all dictionary additions are key by key. So you have to have these 2 loops over groups.

answered Dec 24, 2016 at 3:52

hpaulj

1,5611 gold badge9 silver badges16 bronze badges

\$\begingroup\$ Nice answer, couldn't you inline size, and merge the creation of temp2 with temp3. \$\endgroup\$

Peilonrayz
– Peilonrayz ♦

2016-12-24 04:15:04 +00:00
Commented Dec 24, 2016 at 4:15
\$\begingroup\$ Certainly. Creating the size function just made the temp3 line a little clearer. Purely a readability choice. And I could writetemp2 = {'name': n, 'children': temp3}. That doesn't change speed much. \$\endgroup\$

hpaulj
– hpaulj

2016-12-24 05:02:48 +00:00
Commented Dec 24, 2016 at 5:02

Add a comment |

Peilonrayz · Accepted Answer · 2016-12-24 05:26:14Z

Building on hpaulj's answer, I find if you remove temporary variables it makes the code clearer, it also makes the structure of your data much clearer. And so I'd change it to:

def foo2(rgp):
    list_ = []
    for name, g in rgp:
        values = g.T.to_dict().values()
        n = len(values)
        list_.append({
            'name': name,
            'children': [
                {
                    'name': t['Name'],
                    'size': (t['Rating'] * t['Count'] * 10000) / n
                }
                for t in values
            ]
        })
    return list_

Stack Exchange Network

Adding labels and fields to a nested JSON

2 Answers 2

You must log in to answer this question.

Hot Network Questions

Adding labels and fields to a nested JSON

2 Answers 2

You must log in to answer this question.

Related

Hot Network Questions