Dynamically explode list columns in Pandas

Question

I have a list of dictionaries which has been converted into a dataframe using json_normalize.

Dataset:

x = [{ 
    "_id" : 71, 
    "Ids" : [
        "10", 
        "59"
    ], 
    "roles" : [
        "janitor", "mechanic", "technician"
    ]
}]

Dataframe:

   _id       Ids                            roles
    71  [10, 59]  [janitor, mechanic, technician]

What I am trying to do is find a way to dynamically explode all the list columns/keys (Ids and roles) without explicitly typing the column names. Is this possible?

Desired Output:

   _id       Ids         roles
    71        10       janitor
    71        10      mechanic
    71        10    technician
    71        59       janitor
    71        59      mechanic
    71        59    technician

Any assistance would be appreciated.

Epsi95 · Accepted Answer · 2021-02-19 10:18:49Z

2

I am not sure about the efficiency of this, but it just iterating over the dataframe column names and check whether it is list or not. If it is list just explode.

df_final = df.copy()

for c in df.columns:
    if(isinstance(df[c][0],list)):
        df_final = df_final.explode(c)

    _id Ids roles
0   71  10  janitor
0   71  10  mechanic
0   71  10  technician
0   71  59  janitor
0   71  59  mechanic
0   71  59  technician

answered Feb 19, 2021 at 10:18

Epsi95

9,0971 gold badge19 silver badges37 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Lukas Schmid · Accepted Answer · 2021-02-19 10:36:58Z

A naive solution iterating through all entries checking for lists.

Works for multiple lines of entries and nested lists.

    while True:
        newdf = pd.DataFrame(columns=df.columns)
        for row in df.values:
            for index, value in enumerate(row):
                if isinstance(value, list):
                    for listentry in value:
                        newdf.loc[len(newdf)] = [*row[:index], listentry, *row[index+1:]] 
                    break
            else:
                newdf.loc[len(newdf)] = row
        df = newdf.copy(deep=True)
        if not newdf.applymap(lambda value:isinstance(value, list)).values.any():
            break

Giovanni Frison · Accepted Answer · 2021-02-19 10:47:40Z

0

I would do it like this:

from itertools import product

list_ = [df.iloc[0,i] if type(df.iloc[0,i]) == list else [df.iloc[0,i]] for i in range(df.shape[1])]
prod = list(product(*list_))
df = pd.DataFrame(prod, columns=df.columns)

edited Feb 19, 2021 at 10:47

answered Feb 19, 2021 at 10:20

Giovanni Frison

7184 silver badges20 bronze badges

2 Comments

jcoke Over a year ago

I assume because there are ways to go about it without importing anything, and df.iloc[0,2] is assuming that there is a third column, it should be completely dynamic in the sense that the code should check whether a column contains a list, so it shouldn't assume there will always be 3 columns as it can vary each time

Giovanni Frison Over a year ago

@mdbuzzer Thanks for your comment, now you have the independence for the number of columns

Collectives™ on Stack Overflow

Dynamically explode list columns in Pandas

3 Answers 3

Comments

Comments

2 Comments

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

2 Comments

Related