pandas row values to column headers

Question

I have a daraframe like this

df = pd.DataFrame({'id1':[1,1,1,1,2,2,2],'id2':[1,1,1,1,2,2,2],'value':['a','b','c','d','a','b','c']})

   id1  id2 value
0    1    1     a
1    1    1     b
2    1    1     c
3    1    1     d
4    2    2     a
5    2    2     b
6    2    2     c

I need to transform into this form

   id1  id2  a  b  c  d
0    1    1  1  1  1  1
1    2    2  1  1  1  0

There can be any number of levels in the value variables for each id ranging from 1 to 10. if the level is not present for that id it should be 0 else 1.

I am using anaconda python 3.5, windows 10

Sorry for confusion. I will only have one instance of each value. means, for each id there will be only one 'a'. I only need to check the presence with binary values. Also, Id1 and id2 will be exactly same — Gowtham M
– Gowtham M, Commented Jun 24, 2017 at 5:33
Can you please help me with the situation, in case if i have to get the count. like If 2 1 1 c is changed to 2 1 1 a and I need to get the count. — Gowtham M
– Gowtham M, Commented Jun 24, 2017 at 5:54

jezrael · Accepted Answer · 2017-06-24 05:55:29Z

If need output 1 and 0 only for presence of value:

You can use get_dummies with Series created by set_index, but then is necessary groupby + GroupBy.max:

df = pd.get_dummies(df.set_index(['id1','id2'])['value'])
       .groupby(level=[0,1])
       .max()
       .reset_index()
print (df)
   id1  id2  a  b  c  d
0    1    1  1  1  1  1
1    2    2  1  1  1  0

Another solution with groupby, size and unstack, but then is necesary compare with gt and convert to int by astype. Last reset_index and rename_axis:

df = df.groupby(['id1','id2', 'value'])
      .size()
      .unstack(fill_value=0)
      .gt(0)
      .astype(int)
      .reset_index()
      .rename_axis(None, axis=1)
print (df)
   id1  id2  a  b  c  d
0    1    1  1  1  1  1
1    2    2  1  1  1  0

If need count values:

df = pd.DataFrame({'id1':[1,1,1,1,2,2,2],
                   'id2':[1,1,1,1,2,2,2],
                   'value':['a','b','a','d','a','b','c']})

print (df)
   id1  id2 value
0    1    1     a
1    1    1     b
2    1    1     a
3    1    1     d
4    2    2     a
5    2    2     b
6    2    2     c

df = df.groupby(['id1','id2', 'value'])
       .size()
       .unstack(fill_value=0)
       .reset_index()
       .rename_axis(None, axis=1)
print (df)
   id1  id2  a  b  c  d
0    1    1  2  1  0  1
1    2    2  1  1  1  0

Or:

df = df.pivot_table(index=['id1','id2'], columns='value', aggfunc='size', fill_value=0)
      .reset_index()
      .rename_axis(None, axis=1)
print (df)
   id1  id2  a  b  c  d
0    1    1  2  1  0  1
1    2    2  1  1  1  0

Collectives™ on Stack Overflow

pandas row values to column headers

1 Answer 1

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Linked

Related