0

I have a list of dictionaries which has been converted into a dataframe using json_normalize.

Dataset:

x = [{ 
    "_id" : 71, 
    "Ids" : [
        "10", 
        "59"
    ], 
    "roles" : [
        "janitor", "mechanic", "technician"
    ]
}]

Dataframe:

   _id       Ids                            roles
    71  [10, 59]  [janitor, mechanic, technician]

What I am trying to do is find a way to dynamically explode all the list columns/keys (Ids and roles) without explicitly typing the column names. Is this possible?

Desired Output:

   _id       Ids         roles
    71        10       janitor
    71        10      mechanic
    71        10    technician
    71        59       janitor
    71        59      mechanic
    71        59    technician

Any assistance would be appreciated.

3 Answers 3

2

I am not sure about the efficiency of this, but it just iterating over the dataframe column names and check whether it is list or not. If it is list just explode.

df_final = df.copy()

for c in df.columns:
    if(isinstance(df[c][0],list)):
        df_final = df_final.explode(c)
    _id Ids roles
0   71  10  janitor
0   71  10  mechanic
0   71  10  technician
0   71  59  janitor
0   71  59  mechanic
0   71  59  technician

Sign up to request clarification or add additional context in comments.

Comments

1

A naive solution iterating through all entries checking for lists.

Works for multiple lines of entries and nested lists.

    while True:
        newdf = pd.DataFrame(columns=df.columns)
        for row in df.values:
            for index, value in enumerate(row):
                if isinstance(value, list):
                    for listentry in value:
                        newdf.loc[len(newdf)] = [*row[:index], listentry, *row[index+1:]] 
                    break
            else:
                newdf.loc[len(newdf)] = row
        df = newdf.copy(deep=True)
        if not newdf.applymap(lambda value:isinstance(value, list)).values.any():
            break

Comments

0

I would do it like this:

from itertools import product

list_ = [df.iloc[0,i] if type(df.iloc[0,i]) == list else [df.iloc[0,i]] for i in range(df.shape[1])]
prod = list(product(*list_))
df = pd.DataFrame(prod, columns=df.columns)

2 Comments

I assume because there are ways to go about it without importing anything, and df.iloc[0,2] is assuming that there is a third column, it should be completely dynamic in the sense that the code should check whether a column contains a list, so it shouldn't assume there will always be 3 columns as it can vary each time
@mdbuzzer Thanks for your comment, now you have the independence for the number of columns

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.