21

I have DF that has multiple columns. Two of the columns are list of the same len.( col2 and col3 are list. the len of the list is the same).

My goal is to list each element on it's own row.

I can use the df.explode(). but it only accepts one column. However, I want the pair of the two columns to be 'exploded'. If I do df.explode('col2') and then df.explode('col3'), it results it 9 rows instead of 3.

Original DF

col0      col1        col2        col3
1       aa          [1,2,3]     [1.1,2.2,3.3]
2       bb          [4,5,6]     [4.4,5.5,6.6]
3       cc          [7,8,9]     [7.7,8.8,9.9]
3       cc          [7,8,9]     [7.7,8.8,9.9]

End DataFrame

id      col1        col2        col3
1       aa          1           1.1
1       aa          2           2.2
1       aa          3           3.3
2       bb          4           4.4
2       bb          5           5.5
2       bb          6           6.6
3       cc          ...         ...

Update None of the column have unique values, so can't be used as index.

1
  • Better answer can be found here. No need to use set_index and reset_index. Commented Apr 14, 2021 at 18:43

2 Answers 2

20

You could set col1 as index and apply pd.Series.explode across the columns:

df.set_index('col1').apply(pd.Series.explode).reset_index()

Or:

df.apply(pd.Series.explode)


   col1 col2 col3
0    aa    1  1.1
1    aa    2  2.2
2    aa    3  3.3
3    bb    4  4.4
4    bb    5  5.5
5    bb    6  6.6
6    cc    7  7.7
7    cc    8  8.8
8    cc    9  9.9
9    cc    7  7.7
10   cc    8  8.8
11   cc    9  9.9
Sign up to request clarification or add additional context in comments.

15 Comments

Thanks, I thin it would give an error because none of the column have unique values, so can't be used as index.
ValueError: cannot reindex from a duplicate axis this the error I get when I run the following command. Please advise.
I have Pandas 1.2.0 and I'm getting the same ValueError. I actually had a MultiIndex, but I tried dropping it for the sake of reproducibility, but it didn't work either way.
No need to set_index and reset_index. Just use df = df.apply(pd.Series.explode). This will explode all the columns with lists in your dataframe.
@MayankPorwal I still get ValueError: cannot reindex from a duplicate axis
|
8

I borrowed this solution from other answers (forgot where):

df.explode(['col2', 'col3']).

The advantage: faster than the apply solution.

Make sure both col2 and col3 have the same number of elements in cells in the same row.

1 Comment

Awesome, exactly what I was looking for. Note: It requires pandas >=1.3.0

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.