Use column values of dataframe as index and columns name

Question

I'm working with a dataframe that looks like this:

    Client_ID   Product_ID   Cost
0      4            1         40
1      4            2         32
2      5            1         38
3      6            7         89
4      7            3         21
5      4            5         45
6      2            5         23
7      2            4         71
8      5            8         11
9      7            8         14

For each couple 'Client_ID, Product_ID' there is only one occurence/row in the dataframe.

I want to build a dataframe where the Product_ID is the index and where the the column names are the client names while the cost become the value in each cell, it would look like this:

                     Client_ID
Product_ID    1   2   3   4   5   6   7
   1          x   x   x  40  38   x   x
   2          x   x   x  32   x   x   x
   3          x   x   x   x   x   x  21
   4          x  71   x   x   x   x   x
   5          x  23   x  45   x   x   x
   6          x   x   x   x   x   x   x
   7          x   x   x   x   x  89   x
   8          x   x   x   x  11   x  14
   9          x   x   x   x   x   x   x
  10          x   x   x   x   x   x   x

I tried to achieve this by doing this:

df.pivot(index='Product_ID', columns='Client_ID')

But it didn't work, I tried then making Product_ID the index first and then do the pivot:

df = df.set_index('Product_ID')
df.index.name = None
df.pivot(columns='Client_ID')

No success neither.

Does somebody know how to achieve such a thing?

Thank you for your help.

Edit

The Product_ID values are strings.

df.pivot(index='Product_ID', columns='Client_ID', values='Cost').reindex(columns=np.arange(1, df.Client_ID.max() + 1)).fillna('x') — cs95
– cs95, Commented Nov 16, 2017 at 13:56

jezrael · Accepted Answer · 2017-11-16 14:10:32Z

It seems need pivot + reindex for add missing rows/ columns:

#reindex by union of columns
a = np.union1d(df['Client_ID'],df['Product_ID'])
df = df.pivot(index='Product_ID', columns='Client_ID', values='Cost')
       .reindex(index=a, columns=a)
print (df)
Client_ID    1     2   3     4     5     6     7   8
Product_ID                                          
1          NaN   NaN NaN  40.0  38.0   NaN   NaN NaN
2          NaN   NaN NaN  32.0   NaN   NaN   NaN NaN
3          NaN   NaN NaN   NaN   NaN   NaN  21.0 NaN
4          NaN  71.0 NaN   NaN   NaN   NaN   NaN NaN
5          NaN  23.0 NaN  45.0   NaN   NaN   NaN NaN
6          NaN   NaN NaN   NaN   NaN   NaN   NaN NaN
7          NaN   NaN NaN   NaN   NaN  89.0   NaN NaN
8          NaN   NaN NaN   NaN  11.0   NaN  14.0 NaN

Or:

#1 to max value of columns
b = range(1,df['Client_ID'].max()+1)
a = range(1,df['Product_ID'].max()+1)
df = df.pivot(index='Product_ID', columns='Client_ID', values='Cost')
       .reindex(index=a, columns=b)
print (df)
Client_ID    1     2   3     4     5     6     7
Product_ID                                      
1          NaN   NaN NaN  40.0  38.0   NaN   NaN
2          NaN   NaN NaN  32.0   NaN   NaN   NaN
3          NaN   NaN NaN   NaN   NaN   NaN  21.0
4          NaN  71.0 NaN   NaN   NaN   NaN   NaN
5          NaN  23.0 NaN  45.0   NaN   NaN   NaN
6          NaN   NaN NaN   NaN   NaN   NaN   NaN
7          NaN   NaN NaN   NaN   NaN  89.0   NaN
8          NaN   NaN NaN   NaN  11.0   NaN  14.0

Detail:

print (df.pivot(index='Product_ID', columns='Client_ID', values='Cost'))
Client_ID      2     4     5     6     7
Product_ID                              
1            NaN  40.0  38.0   NaN   NaN
2            NaN  32.0   NaN   NaN   NaN
3            NaN   NaN   NaN   NaN  21.0
4           71.0   NaN   NaN   NaN   NaN
5           23.0  45.0   NaN   NaN   NaN
7            NaN   NaN   NaN  89.0   NaN
8            NaN   NaN  11.0   NaN  14.0

Last if necessary replace NaNs, but get mixed values - numeric with strings:

df = df.fillna('x')

I have a constraint that I forgot to share, it is that the Product_ID is a string :/
So what about (df.pivot(index='Product_ID', columns='Client_ID', values='Cost').reindex(columns=b) ?
I don't understand what the .reindex method does, is it necessary? When I run your 3rd solution, the one liner alone, it works straight away...
Hmmm, it is if need add missing rows, columns to data. If not, just use pivot only.
But if want change it in question, then is necessary remove answer, because dupe...

Collectives™ on Stack Overflow

Use column values of dataframe as index and columns name

Edit

1 Answer 1

8 Comments

Hot Network Questions

Collectives™ on Stack Overflow

Edit

1 Answer 1

8 Comments

Related