4

I have dataframe like below:

     ColumnA      ColumnB         ColumnC
0       usr       usr1,usr2       X1
1       xyz       xyz1,xyz2,xyz3  X2
2       abc       abc1,abc2,abc3  X3

What I want to do is:

  • split column B by ","

  • Problem is some cells of column B has 3 variable (xyz1,xyz2,xyz3), some of them 6 etc. It is not stable.

Expected output:

     ColumnA      ColumnB          usercol1    usercol2    usercol3  ColumnC
0       usr       usr1,usr2           usr1      usr2           -       X1
1       xyz       xyz1,xyz2,xyz3      xyz1      xyz2          xyz3     X2
2       abc       abc1,abc2,abc3      abc1      abc2          abc3     X3
2
  • kindly post your expected output Commented Oct 9, 2020 at 10:16
  • thanks for your interest, I added expected output. Commented Oct 9, 2020 at 10:21

1 Answer 1

2
  1. Create a new dataframe that uses expand=True with str.split()
  2. Then concat the first two columns, the new expanded dataframe and the third original dataframecolumn. This is dynamic to uneven list lengths.

df1 = df['ColumnB'].str.split(',',expand=True).add_prefix('usercol')
df1 = pd.concat([df[['ColumnA', 'ColumnB']],df1, df[['ColumnC']]], axis=1).replace(np.nan, '-')
df1
Out[1]: 
     ColumnA      ColumnB          usercol0    usercol1    usercol2  ColumnC
0       usr       usr1,usr2           usr1      usr2          -        X1
1       xyz       xyz1,xyz2,xyz3      xyz1      xyz2          xyz3     X2
2       abc       abc1,abc2,abc3      abc1      abc2          abc3     X3

Technically, this could be done with one line as well:

df = pd.concat([df[['ColumnA', 'ColumnB']],
                df['ColumnB'].str.split(',',expand=True).add_prefix('usercol'),
                df[['ColumnC']]], axis=1).replace(np.nan, '-')
df
Out[1]: 
  ColumnA         ColumnB usercol0 usercol1 usercol2 ColumnC
0     usr       usr1,usr2     usr1     usr2        -      X1
1     xyz  xyz1,xyz2,xyz3     xyz1     xyz2     xyz3      X2
2     abc  abc1,abc2,abc3     abc1     abc2     abc3      X3
Sign up to request clarification or add additional context in comments.

Comments