3

I am new to Pandas framework and I have searched enough to resolve my issue but did not get much help online.

I have a string column as given below and I want to convert it into separate columns. My problem here is I have tried splitting it but it did not give me the output the way I need.

*-----------------------------------------------------------------------------*
|  Total Visitor                                                              |
*-----------------------------------------------------------------------------*
|  2x Adult, 1x Adult + Audio Guide                                           |
|  2x Adult, 2x Youth, 1x Children                                            | 
|  5x Adult + Audio Guide, 1x Children + Audio Guide, 1x Senior + Audio Guide |
*-----------------------------------------------------------------------------*

here is the code I used to split my string but did not give me expected output.

df = data["Total Visitor"].str.split(",", n = 1, expand = True)

My Expected Output should be as following table after splitting the string:

*----------------------------------------------------------------------------------------------------------------*
|  Adult    | Adult + Audio Guide    | Youth   | Children    | Children + AG        | Senior + AG                                                                       
*----------------------------------------------------------------------------------------------------------------*
|  2x Adult | 1x Adult + Audio Guide |    -    |       -     |    -                    | -  
|
|  2x Adult |          -             |2x Youth | 1x Children |    -                    | -                               
|      -    | 5x Adult + Audio Guide |    -    |      -      |1x Children + Audio Guide| 1x Senior + Audio Guide |
*----------------------------------------------------------------------------------------------------------------*

How can I do this? Any help or guidance would be great.

2 Answers 2

6

Idea is create list of dictionaries with keys of removed numbers with x by regex - ^\d+x\s+ (^ is start of string, \d+ is one or more integers and \s+ is one or more whitespaces) and pass to DataFrame constructor:

import re

L =[dict([(re.sub('^\d+x\s+',"",y),y) for y in x.split(', ')]) for x in df['Total Visitor']]

df = pd.DataFrame(L).fillna('-')
print (df)
      Adult     Adult + Audio Guide     Youth     Children  \
0  2x Adult  1x Adult + Audio Guide         -            -   
1  2x Adult                       -  2x Youth  1x Children   
2         -  5x Adult + Audio Guide         -            -   

      Children + Audio Guide     Senior + Audio Guide  
0                          -                        -  
1                          -                        -  
2  1x Children + Audio Guide  1x Senior + Audio Guide  

Another similar idea is split by x for columns names from keys of dicts:

L = [dict([(y.split('x ')[1], y) for y in x.split(', ')]) for x in df['Total Visitor']]

df = pd.DataFrame(L).fillna('-')
Sign up to request clarification or add additional context in comments.

Comments

2

Here is a way using pandas methods:

dstack = df['Total Visitor'].str.split(',', expand=True).stack().str.strip().to_frame()
dstack['cols'] = dstack[0].str.extract(r'\d+x\s(.*)')
df_out = dstack.set_index('cols', append=True)[0].reset_index(level=1, drop=True).unstack()
df_out

Output:

cols     Adult     Adult + Audio Guide     Children     Children + Audio Guide     Senior + Audio Guide     Youth
0     2x Adult  1x Adult + Audio Guide          NaN                        NaN                      NaN       NaN
1     2x Adult                     NaN  1x Children                        NaN                      NaN  2x Youth
2          NaN  5x Adult + Audio Guide          NaN  1x Children + Audio Guide  1x Senior + Audio Guide       NaN

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.