0

I'm working with a dataframe that looks like this:

    Client_ID   Product_ID   Cost
0      4            1         40
1      4            2         32
2      5            1         38
3      6            7         89
4      7            3         21
5      4            5         45
6      2            5         23
7      2            4         71
8      5            8         11
9      7            8         14

For each couple 'Client_ID, Product_ID' there is only one occurence/row in the dataframe.

I want to build a dataframe where the Product_ID is the index and where the the column names are the client names while the cost become the value in each cell, it would look like this:

                     Client_ID
Product_ID    1   2   3   4   5   6   7
   1          x   x   x  40  38   x   x
   2          x   x   x  32   x   x   x
   3          x   x   x   x   x   x  21
   4          x  71   x   x   x   x   x
   5          x  23   x  45   x   x   x
   6          x   x   x   x   x   x   x
   7          x   x   x   x   x  89   x
   8          x   x   x   x  11   x  14
   9          x   x   x   x   x   x   x
  10          x   x   x   x   x   x   x

I tried to achieve this by doing this:

df.pivot(index='Product_ID', columns='Client_ID')

But it didn't work, I tried then making Product_ID the index first and then do the pivot:

df = df.set_index('Product_ID')
df.index.name = None
df.pivot(columns='Client_ID')

No success neither.

Does somebody know how to achieve such a thing?

Thank you for your help.

Edit

The Product_ID values are strings.

1
  • 2
    df.pivot(index='Product_ID', columns='Client_ID', values='Cost').reindex(columns=np.arange(1, df.Client_ID.max() + 1)).fillna('x') Commented Nov 16, 2017 at 13:56

1 Answer 1

1

It seems need pivot + reindex for add missing rows/ columns:

#reindex by union of columns
a = np.union1d(df['Client_ID'],df['Product_ID'])
df = df.pivot(index='Product_ID', columns='Client_ID', values='Cost')
       .reindex(index=a, columns=a)
print (df)
Client_ID    1     2   3     4     5     6     7   8
Product_ID                                          
1          NaN   NaN NaN  40.0  38.0   NaN   NaN NaN
2          NaN   NaN NaN  32.0   NaN   NaN   NaN NaN
3          NaN   NaN NaN   NaN   NaN   NaN  21.0 NaN
4          NaN  71.0 NaN   NaN   NaN   NaN   NaN NaN
5          NaN  23.0 NaN  45.0   NaN   NaN   NaN NaN
6          NaN   NaN NaN   NaN   NaN   NaN   NaN NaN
7          NaN   NaN NaN   NaN   NaN  89.0   NaN NaN
8          NaN   NaN NaN   NaN  11.0   NaN  14.0 NaN

Or:

#1 to max value of columns
b = range(1,df['Client_ID'].max()+1)
a = range(1,df['Product_ID'].max()+1)
df = df.pivot(index='Product_ID', columns='Client_ID', values='Cost')
       .reindex(index=a, columns=b)
print (df)
Client_ID    1     2   3     4     5     6     7
Product_ID                                      
1          NaN   NaN NaN  40.0  38.0   NaN   NaN
2          NaN   NaN NaN  32.0   NaN   NaN   NaN
3          NaN   NaN NaN   NaN   NaN   NaN  21.0
4          NaN  71.0 NaN   NaN   NaN   NaN   NaN
5          NaN  23.0 NaN  45.0   NaN   NaN   NaN
6          NaN   NaN NaN   NaN   NaN   NaN   NaN
7          NaN   NaN NaN   NaN   NaN  89.0   NaN
8          NaN   NaN NaN   NaN  11.0   NaN  14.0

Detail:

print (df.pivot(index='Product_ID', columns='Client_ID', values='Cost'))
Client_ID      2     4     5     6     7
Product_ID                              
1            NaN  40.0  38.0   NaN   NaN
2            NaN  32.0   NaN   NaN   NaN
3            NaN   NaN   NaN   NaN  21.0
4           71.0   NaN   NaN   NaN   NaN
5           23.0  45.0   NaN   NaN   NaN
7            NaN   NaN   NaN  89.0   NaN
8            NaN   NaN  11.0   NaN  14.0

Last if necessary replace NaNs, but get mixed values - numeric with strings:

df = df.fillna('x')
Sign up to request clarification or add additional context in comments.

8 Comments

I have a constraint that I forgot to share, it is that the Product_ID is a string :/
So what about (df.pivot(index='Product_ID', columns='Client_ID', values='Cost').reindex(columns=b) ?
I don't understand what the .reindex method does, is it necessary? When I run your 3rd solution, the one liner alone, it works straight away...
Hmmm, it is if need add missing rows, columns to data. If not, just use pivot only.
But if want change it in question, then is necessary remove answer, because dupe...
|

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.