Explode the list values in dataframe columns

Question

I am having a dataframe with following values:

sentence_id  words                    labels
3822445      ['a', 'b', 'c', '']      ['B-PER', 'I-PER', 'I-PER', 'I-PER']
3822446      ['d', 'e', '']           ['B-PER', 'I-PER', 'I-PER']
3822447      ['f', 'g', 'h']          ['B-PER', 'I-PER', 'I-PER']

Exepcting output as:

sentence_id  words    labels    
3822445       'a'     'B-PER'
3822445       'b'     'I-PER'
3822445       'c'     'I-PER'
3822445       ''      'I-PER'
3822446       'd'     'B-PER'
3822446       'e'     'I-PER'
3822446       ''      'I-PER'
3822447       'f'     'B-PER'
3822447       'g'     'I-PER'
3822447       'h'     'I-PER'

I have tried:

dataframe.set_index(['sentence_id']).apply(pd.Series.explode).reset_index()

but giving same output as input. Don't know what's going wrong.

Looks like: stackoverflow.com/questions/63583059/…. Just remove the .query part from my solution there since you want to keep the empty strings. — ALollz
– ALollz, Commented Feb 22, 2021 at 17:35

Ric S · Accepted Answer · 2021-02-22 17:59:53Z

3

If you want a simple one-liner you can use explode with pandas>=0.25.0

df.explode('words').assign(labels=df['labels'].explode())

answered Feb 22, 2021 at 17:59

Ric S

9,3184 gold badges30 silver badges57 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Shyam Over a year ago

Actually it was a silly mistake, when I was reading csv it was taking list of words as list in string

Scott Boston · Accepted Answer · 2021-07-06 14:53:02Z

Update for pandas 1.3.0

pandas.DataFrame.explode now accepts a list of column headers

df.explode(['words','labels'], ignore_index=True)

Output:

   sentence_id words labels
0      3822445     a  B-PER
1      3822445     b  I-PER
2      3822445     c  I-PER
3      3822445        I-PER
4      3822446     d  B-PER
5      3822446     e  I-PER
6      3822446        I-PER
7      3822447     f  B-PER
8      3822447     g  I-PER
9      3822447     h  I-PER

This works fine with me. What are your unexpected results?

df  = pd.DataFrame({'sentence_id':[3822445, 3822446, 3822447],
                    'words':[['a', 'b', 'c', ''],
                            ['d', 'e', ''],
                            ['f', 'g','h']],
                   'labels':[['B-PER', 'I-PER', 'I-PER', 'I-PER'],
                            ['B-PER','I-PER', 'I-PER'],
                            ['B-PER', 'I-PER','I-PER']]})

df.set_index('sentence_id').apply(pd.Series.explode).reset_index()

Output:

   sentence_id words labels
0      3822445     a  B-PER
1      3822445     b  I-PER
2      3822445     c  I-PER
3      3822445        I-PER
4      3822446     d  B-PER
5      3822446     e  I-PER
6      3822446        I-PER
7      3822447     f  B-PER
8      3822447     g  I-PER
9      3822447     h  I-PER

Actually it was a silly mistake, when I was reading csv it was taking list of words as list in string.

Collectives™ on Stack Overflow

Explode the list values in dataframe columns

2 Answers 2

1 Comment

Update for pandas 1.3.0

pandas.DataFrame.explode now accepts a list of column headers

1 Comment

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Update for pandas 1.3.0

pandas.DataFrame.explode now accepts a list of column headers

1 Comment

Linked

Related