5

I have a daraframe like this

df = pd.DataFrame({'id1':[1,1,1,1,2,2,2],'id2':[1,1,1,1,2,2,2],'value':['a','b','c','d','a','b','c']})

   id1  id2 value
0    1    1     a
1    1    1     b
2    1    1     c
3    1    1     d
4    2    2     a
5    2    2     b
6    2    2     c

I need to transform into this form

   id1  id2  a  b  c  d
0    1    1  1  1  1  1
1    2    2  1  1  1  0

There can be any number of levels in the value variables for each id ranging from 1 to 10. if the level is not present for that id it should be 0 else 1.

I am using anaconda python 3.5, windows 10

4
  • 1
    If 2 1 1 c is changed to 2 1 1 a what is output? Commented Jun 24, 2017 at 5:15
  • Sorry for confusion. I will only have one instance of each value. means, for each id there will be only one 'a'. I only need to check the presence with binary values. Also, Id1 and id2 will be exactly same Commented Jun 24, 2017 at 5:33
  • 1
    Ok, so first 2 solutions are for you. Commented Jun 24, 2017 at 5:34
  • Can you please help me with the situation, in case if i have to get the count. like If 2 1 1 c is changed to 2 1 1 a and I need to get the count. Commented Jun 24, 2017 at 5:54

1 Answer 1

5

If need output 1 and 0 only for presence of value:

You can use get_dummies with Series created by set_index, but then is necessary groupby + GroupBy.max:

df = pd.get_dummies(df.set_index(['id1','id2'])['value'])
       .groupby(level=[0,1])
       .max()
       .reset_index()
print (df)
   id1  id2  a  b  c  d
0    1    1  1  1  1  1
1    2    2  1  1  1  0

Another solution with groupby, size and unstack, but then is necesary compare with gt and convert to int by astype. Last reset_index and rename_axis:

df = df.groupby(['id1','id2', 'value'])
      .size()
      .unstack(fill_value=0)
      .gt(0)
      .astype(int)
      .reset_index()
      .rename_axis(None, axis=1)
print (df)
   id1  id2  a  b  c  d
0    1    1  1  1  1  1
1    2    2  1  1  1  0

If need count values:

df = pd.DataFrame({'id1':[1,1,1,1,2,2,2],
                   'id2':[1,1,1,1,2,2,2],
                   'value':['a','b','a','d','a','b','c']})

print (df)
   id1  id2 value
0    1    1     a
1    1    1     b
2    1    1     a
3    1    1     d
4    2    2     a
5    2    2     b
6    2    2     c

df = df.groupby(['id1','id2', 'value'])
       .size()
       .unstack(fill_value=0)
       .reset_index()
       .rename_axis(None, axis=1)
print (df)
   id1  id2  a  b  c  d
0    1    1  2  1  0  1
1    2    2  1  1  1  0

Or:

df = df.pivot_table(index=['id1','id2'], columns='value', aggfunc='size', fill_value=0)
      .reset_index()
      .rename_axis(None, axis=1)
print (df)
   id1  id2  a  b  c  d
0    1    1  2  1  0  1
1    2    2  1  1  1  0
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.