3

I have a dataframe with two columns of lists:

>>> import pandas as pd
>>> df = pd.DataFrame({'A': ['x1','x2','x3', 'x4'], 'B':[['v1','v2'],['v3','v4'],['v6'],['v7','v8']], 'C':[['c1','c2'],['c3','c4'],['c5','c6'],['c7']]})
>>> df
    A         B         C
0  x1  [v1, v2]  [c1, c2]
1  x2  [v3, v4]  [c3, c4]
2  x3      [v6]  [c5, c6]
3  x4  [v7, v8]      [c7]

I would like to explode columns B and C, so the output looks like this:

>>> df_exploded
    A         B         C
0  x1        v1        c1
1  x1        v2        c2
2  x2        v3        c3
3  x2        v4        c4
4  x3        v6        c5
5  x3        v6        c6
6  x4        v7        c7
7  x4        v8        c7

My current solution is to first separate rows where elements in column B and C have the same length and run df.explode(["B", "C"]) and for the rest rows, run df.explode("B") followed by df.explode("C")

I am wondering if there's a better solution.

2
  • 1
    You want two subtly different things here, for lists of equal length, you zip the lists into pairs (e.g. (v1, c1) and (v2, c2)). But for lists of unequal length, you want the combinations (e.g. (v6, c5) and (v6, c6)). Since you want two separate outcomes, you'll be stuck with your current solution where you separate the two different cases. Commented Dec 14, 2022 at 15:15
  • This question is currently lacking detail / is opinion-based. What do you mean by 'better', in objective terms? Are you experiencing an actual problem that you need solved? Commented Jan 10, 2023 at 16:37

2 Answers 2

6

use itertools.zip_longest

import itertools

df1 = (df.apply(lambda x: list(itertools.zip_longest(x['B'], x['C'])), axis=1)
       .explode()
       .apply(lambda x: pd.Series(x, index=['B', 'C']))
       .groupby(level=0).ffill())

df1

    B   C
0   v1  c1
0   v2  c2
1   v3  c3
1   v4  c4
2   v6  c5
2   v6  c6
3   v7  c7
3   v8  c7



get desired output by using df1

df[['A']].join(df1)

output:

    A   B   C
0   x1  v1  c1
0   x1  v2  c2
1   x2  v3  c3
1   x2  v4  c4
2   x3  v6  c5
2   x3  v6  c6
3   x4  v7  c7
3   x4  v8  c7

if you want, you can use reset_index for index

Sign up to request clarification or add additional context in comments.

Comments

-4

Yes, there is a better solution. Instead of separating the rows where the lists in columns B and C have the same length, you can use the explode method on both columns at the same time, and it will automatically take care of rows where the lists have different lengths. Here's how you can do it:

df_exploded = df.explode(["B", "C"])

This will give you the expected output:

    A         B         C
0  x1        v1        c1
1  x1        v2        c2
2  x2        v3        c3
3  x2        v4        c4
4  x3        v6        c5
5  x3        v6        c6
6  x4        v7        c7
7  x4        v8        c7

3 Comments

Thanks for your answer! When I tried this I got the error: ValueError: columns must have matching element counts. Maybe I need to update my pandas?
I tested with pandas 1.4.2 and 1.5.2 and got the same error.
This answer results in error as mentioned by OP.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.