0

I have a dataframe with two columns : ID and Demographic_distribution. ID is just a number (ex:123456). demographic_impression has a list for each ID. here's an example of what you get in the demographic_distribution column for one ID :

ID : 123456

Demographic_distribution : [{percentage: 0.000952, age: 25-34, gender: unknown}, {percentage: 0.093621, age: 55-64, gender: male}, {percentage: 0.002856, age: 35-44, gender: unknown}, {percentage: 0.031736, age: 18-24, gender: female}, {percentage: 0.085052, age: 25-34, gender: male}, {percentage: 0.019994, age: 18-24, gender: male}, {percentage: 0.085687, age: 35-44, gender: male}, {percentage: 0.133608, age: 55-64, gender: female}, {percentage: 0.112345, age: 65+, gender: female}, {percentage: 0.000317, age: 18-24, gender: unknown}, {percentage: 0.095208, age: 45-54, gender: female}, {percentage: 0.067598, age: 65+, gender: male}, {percentage: 0.086004, age: 45-54, gender: male}, {percentage: 0.075849, age: 25-34, gender: female}, {percentage: 0.098699, age: 35-44, gender: female}, {percentage: 0.003174, age: 65+, gender: unknown}, {percentage: 0.003174, age: 45-54, gender: unknown}, {percentage: 0.004126, age: 55-64, gender: unknown}]

You can see that there are 5 age groups, 3 genders and many percentages. I would like to split the demographic column into three different columns for each parameters. Let's not forget that these informations are liked to an ID in each row, otherwise it doesn't make sense. I tried .explode, but it didn't work.

Any idea how to do this ?

3 Answers 3

2

json.normalize is what you are after.

pandas docs

Sign up to request clarification or add additional context in comments.

Comments

0

Try:

df = df.explode("Demographic_distribution")

df = pd.concat(
    [df, df.pop("Demographic_distribution").apply(pd.Series)], axis=1
)
print(df)

Prints:

       ID  percentage    age   gender
0  123456    0.000952  25-34  unknown
0  123456    0.093621  55-64     male
0  123456    0.002856  35-44  unknown
0  123456    0.031736  18-24   female
0  123456    0.085052  25-34     male
0  123456    0.019994  18-24     male
0  123456    0.085687  35-44     male
0  123456    0.133608  55-64   female
0  123456    0.112345    65+   female
0  123456    0.000317  18-24  unknown
0  123456    0.095208  45-54   female
0  123456    0.067598    65+     male
0  123456    0.086004  45-54     male
0  123456    0.075849  25-34   female
0  123456    0.098699  35-44   female
0  123456    0.003174    65+  unknown
0  123456    0.003174  45-54  unknown
0  123456    0.004126  55-64  unknown

df used:

       ID                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          Demographic_distribution
0  123456  [{'percentage': 0.000952, 'age': '25-34', 'gender': 'unknown'}, {'percentage': 0.093621, 'age': '55-64', 'gender': 'male'}, {'percentage': 0.002856, 'age': '35-44', 'gender': 'unknown'}, {'percentage': 0.031736, 'age': '18-24', 'gender': 'female'}, {'percentage': 0.085052, 'age': '25-34', 'gender': 'male'}, {'percentage': 0.019994, 'age': '18-24', 'gender': 'male'}, {'percentage': 0.085687, 'age': '35-44', 'gender': 'male'}, {'percentage': 0.133608, 'age': '55-64', 'gender': 'female'}, {'percentage': 0.112345, 'age': '65+', 'gender': 'female'}, {'percentage': 0.000317, 'age': '18-24', 'gender': 'unknown'}, {'percentage': 0.095208, 'age': '45-54', 'gender': 'female'}, {'percentage': 0.067598, 'age': '65+', 'gender': 'male'}, {'percentage': 0.086004, 'age': '45-54', 'gender': 'male'}, {'percentage': 0.075849, 'age': '25-34', 'gender': 'female'}, {'percentage': 0.098699, 'age': '35-44', 'gender': 'female'}, {'percentage': 0.003174, 'age': '65+', 'gender': 'unknown'}, {'percentage': 0.003174, 'age': '45-54', 'gender': 'unknown'}, {'percentage': 0.004126, 'age': '55-64', 'gender': 'unknown'}]

Comments

0

I did something like this give a try once.

import json
import os
import pandas as pd

#if Demographic Distribution is more then you can iterate through loop

Demographic_distribution = [
    {"percentage": 0.000952, "age": "25-34", "gender": "unknown"}, 
    {"percentage": 0.093621, "age": "55-64", "gender": "male"}, 
    {"percentage": 0.002856, "age": "35-44", "gender": "unknown"}, 
    {"percentage": 0.031736, "age": "18-24", "gender": "female"}, 
    {"percentage": 0.085052, "age": "25-34", "gender": "male"}, 
    {"percentage": 0.019994, "age": "18-24", "gender": "male"}, 
    {"percentage": 0.085687, "age": "35-44", "gender": "male"}]

df = pd.DataFrame.from_dict(Demographic_distribution)
df['ID'] = 123456

df.to_csv("D:\\Path\\Output.csv",index=False)

Output

enter image description here

1 Comment

Your answer could be improved with additional supporting information. Please edit to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers in the help center.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.