How to explode two columns of lists with different length using pandas

Question

I have a dataframe with two columns of lists:

>>> import pandas as pd
>>> df = pd.DataFrame({'A': ['x1','x2','x3', 'x4'], 'B':[['v1','v2'],['v3','v4'],['v6'],['v7','v8']], 'C':[['c1','c2'],['c3','c4'],['c5','c6'],['c7']]})
>>> df
    A         B         C
0  x1  [v1, v2]  [c1, c2]
1  x2  [v3, v4]  [c3, c4]
2  x3      [v6]  [c5, c6]
3  x4  [v7, v8]      [c7]

I would like to explode columns B and C, so the output looks like this:

>>> df_exploded
    A         B         C
0  x1        v1        c1
1  x1        v2        c2
2  x2        v3        c3
3  x2        v4        c4
4  x3        v6        c5
5  x3        v6        c6
6  x4        v7        c7
7  x4        v8        c7

My current solution is to first separate rows where elements in column B and C have the same length and run df.explode(["B", "C"]) and for the rest rows, run df.explode("B") followed by df.explode("C")

I am wondering if there's a better solution.

You want two subtly different things here, for lists of equal length, you zip the lists into pairs (e.g. (v1, c1) and (v2, c2)). But for lists of unequal length, you want the combinations (e.g. (v6, c5) and (v6, c6)). Since you want two separate outcomes, you'll be stuck with your current solution where you separate the two different cases. — Swier
– Swier, Commented Dec 14, 2022 at 15:15
This question is currently lacking detail / is opinion-based. What do you mean by 'better', in objective terms? Are you experiencing an actual problem that you need solved? — TylerH
– TylerH, Commented Jan 10, 2023 at 16:37

Panda Kim · Accepted Answer · 2022-12-14 15:52:59Z

6

use itertools.zip_longest

import itertools

df1 = (df.apply(lambda x: list(itertools.zip_longest(x['B'], x['C'])), axis=1)
       .explode()
       .apply(lambda x: pd.Series(x, index=['B', 'C']))
       .groupby(level=0).ffill())

df1

get desired output by using df1

df[['A']].join(df1)

output:

    A   B   C
0   x1  v1  c1
0   x1  v2  c2
1   x2  v3  c3
1   x2  v4  c4
2   x3  v6  c5
2   x3  v6  c6
3   x4  v7  c7
3   x4  v8  c7

if you want, you can use reset_index for index

edited Dec 14, 2022 at 15:52

answered Dec 14, 2022 at 15:44

Panda Kim

13.6k2 gold badges7 silver badges15 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

esiquiel · Accepted Answer · 2022-12-14 15:09:53Z

-4

Yes, there is a better solution. Instead of separating the rows where the lists in columns B and C have the same length, you can use the explode method on both columns at the same time, and it will automatically take care of rows where the lists have different lengths. Here's how you can do it:

df_exploded = df.explode(["B", "C"])

This will give you the expected output:

    A         B         C
0  x1        v1        c1
1  x1        v2        c2
2  x2        v3        c3
3  x2        v4        c4
4  x3        v6        c5
5  x3        v6        c6
6  x4        v7        c7
7  x4        v8        c7

answered Dec 14, 2022 at 15:09

esiquiel

357 bronze badges

3 Comments

KiwiFT Over a year ago

Thanks for your answer! When I tried this I got the error: ValueError: columns must have matching element counts. Maybe I need to update my pandas?

KiwiFT Over a year ago

I tested with pandas 1.4.2 and 1.5.2 and got the same error.

Azhar Khan Over a year ago

This answer results in error as mentioned by OP.

Collectives™ on Stack Overflow

How to explode two columns of lists with different length using pandas

2 Answers 2

Comments

3 Comments

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

3 Comments

Related