6

Good morning chaps,

Any pythonic way to explode a dataframe column into multiple columns with boolean flags, based on some condition (str.contains in this case)?

Let's say I have this:

Position Letter 
1        a      
2        b      
3        c      
4        b      
5        b

And I'd like to achieve this:

Position Letter is_a     is_b    is_C
1        a      TRUE     FALSE   FALSE
2        b      FALSE    TRUE    FALSE
3        c      FALSE    FALSE   TRUE
4        b      FALSE    TRUE    FALSE
5        b      FALSE    TRUE    FALSE 

Can do with a loop through 'abc' and explicitly creating new df columns, but wondering if some built-in method already exists in pandas. Number of possible values, and hence number of new columns is variable.

Thanks and regards.

2
  • Can you show us a minimal example of what you have tried so far? Please have a look here: stackoverflow.com/help/how-to-ask Commented Nov 15, 2017 at 12:46
  • for lt in Letter: df[lt] = df.Letter.str.contains(lt) Commented Nov 16, 2017 at 14:56

1 Answer 1

8

use Series.str.get_dummies():

In [31]: df.join(df.Letter.str.get_dummies())
Out[31]:
   Position Letter  a  b  c
0         1      a  1  0  0
1         2      b  0  1  0
2         3      c  0  0  1
3         4      b  0  1  0
4         5      b  0  1  0

or

In [32]: df.join(df.Letter.str.get_dummies().astype(bool))
Out[32]:
   Position Letter      a      b      c
0         1      a   True  False  False
1         2      b  False   True  False
2         3      c  False  False   True
3         4      b  False   True  False
4         5      b  False   True  False
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.