Explode single DataFrame row into multiple ones

Question

My DataFrame has some columns where each value can be "1", "2", "3" or "any". Here is an example:

>>> df = pd.DataFrame({'a': ['1', '2', 'any', '3'], 'b': ['any', 'any', '3', '1']})
>>> df
     a    b
0    1  any
1    2  any
2  any    3
3    3    1

In my case, "any" means that the value can be "1", "2" or "3". I would like to generate all possible rows using only values "1", "2" and "3" (or, in general, any list of values that I might have). Here is the expected output for the example above:

I got this output with this kind of ugly and complicated approach:

a = df['a'].replace('any', '1,2,3').apply(lambda x: eval(f'[{str(x)}]')).explode()
result = pd.merge(df.drop(columns=['a']), a, left_index=True, right_index=True)
b = result['b'].replace('any', '1,2,3').apply(lambda x: eval(f'[{str(x)}]')).explode()
result = pd.merge(result.drop(columns=['b']), b, left_index=True, right_index=True)
result = result.drop_duplicates().reset_index(drop=True)

Is there any simpler and/or nicer approach?

Quang Hoang · Accepted Answer · 2021-04-14 16:32:44Z

4

You can replace the string any with, e.g. '1,2,3', then split and explode:

(df.replace('any', '1,2,3')
   .apply(lambda x: x.str.split(',') if x.name in ['a','b'] else x)
   .explode('a').explode('b')
   .drop_duplicates(['a','b'])
)

Output:

edited Apr 14, 2021 at 16:32

answered Apr 14, 2021 at 16:23

Quang Hoang

151k11 gold badges63 silver badges86 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Riccardo Bucco Over a year ago

What if my dataframe has other columns? With your solution, if I have a column (let's call it 'c') with integer values, this approach would result in an error

Eelco van Vliet · Accepted Answer · 2021-04-15 07:31:58Z

1

I would not use eval and string manipulations, but just replace 'any' with a set of values

import pandas as pd
df = pd.DataFrame({'a': ['1', '2', 'any', '3'], 'b': ['any', 'any', '3', '1']})
df['c'] = '1'

df[df == 'any'] = {'1', '2', '3'}
for col in df:
    df = df.explode(col)
df = df.drop_duplicates().reset_index(drop=True)
print(df)

This gives the result

edited Apr 15, 2021 at 7:31

answered Apr 14, 2021 at 16:55

Eelco van Vliet

1,24813 silver badges21 bronze badges

Collectives™ on Stack Overflow

Explode single DataFrame row into multiple ones

2 Answers 2

1 Comment

Comments

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Related