0

I have a dataframe where I want to convert the columns headers into aggregated rows and a column into the header:

Athlete  Variable    28.0   28.0  28.0  29.0   29.0   29.0

John     Minutes     23     32    43    21     43     35
John     Distance    23.2   32.7  54.2  24.2   47.8   32.1
John     Sprints     5      3     8     3      9      2
Bob      Minutes     25     23    43    31     38     23
Bob      Distance    32.4   35.6  43.7  21.8   26.8   12.5
Bob      Sprints     8      4     8     6      9      2

Desired dataframe:

Athlete   week    Minutes   Distance   Sprints

John      28      98        110.1      16
John      29      99        104.1      14
Bob       28      91        111.7      20
Bob       29      92        61.1       17

2 Answers 2

1

You’re looking for pivot_table, but you’ll need to melt the dataframe to a mono-dimensional format first:

>>> data = df.melt(id_vars=['Athlete', 'Variable'], var_name='week')
>>> data
   Athlete  Variable  week  value
0     John   Minutes  28.0   23.0
1     John  Distance  28.0   23.2
2     John   Sprints  28.0    5.0
3      Bob   Minutes  28.0   25.0
4      Bob  Distance  28.0   32.4
5      Bob   Sprints  28.0    8.0
6     John   Minutes  28.0   32.0
7     John  Distance  28.0   32.7
8     John   Sprints  28.0    3.0
9      Bob   Minutes  28.0   23.0
10     Bob  Distance  28.0   35.6
11     Bob   Sprints  28.0    4.0
12    John   Minutes  28.0   43.0
13    John  Distance  28.0   54.2
14    John   Sprints  28.0    8.0
15     Bob   Minutes  28.0   43.0
16     Bob  Distance  28.0   43.7
17     Bob   Sprints  28.0    8.0
18    John   Minutes  29.0   21.0
19    John  Distance  29.0   24.2
20    John   Sprints  29.0    3.0
21     Bob   Minutes  29.0   31.0
22     Bob  Distance  29.0   21.8
23     Bob   Sprints  29.0    6.0
24    John   Minutes  29.0   43.0
25    John  Distance  29.0   47.8
26    John   Sprints  29.0    9.0
27     Bob   Minutes  29.0   38.0
28     Bob  Distance  29.0   26.8
29     Bob   Sprints  29.0    9.0
30    John   Minutes  29.0   35.0
31    John  Distance  29.0   32.1
32    John   Sprints  29.0    2.0
33     Bob   Minutes  29.0   23.0
34     Bob  Distance  29.0   12.5
35     Bob   Sprints  29.0    2.0

So using pivot_table, you have to specify the aggregation function, here we use sum, the default being mean:

>>> data.pivot_table(index=['Athlete', 'week'], columns='Variable', values='value', aggfunc='sum')
Variable      Distance  Minutes  Sprints
Athlete week                            
Bob     28.0     111.7     91.0     20.0
        29.0      61.1     92.0     17.0
John    28.0     110.1     98.0     16.0
        29.0     104.1     99.0     14.0
Sign up to request clarification or add additional context in comments.

Comments

0

Try with groupby first then do pivot and stack

out = df.groupby(level=0,axis=1).sum().pivot('Athlete','Variable').\
          rename_axis(['week',None],axis=1).stack(level=0).reset_index()
Out[466]: 
  Athlete  week  Distance  Minutes  Sprints
0     Bob  28.0     111.7     91.0     20.0
1     Bob  29.0      61.1     92.0     17.0
2    John  28.0     110.1     98.0     16.0
3    John  29.0     104.1     99.0     14.0

3 Comments

Doesn’t seem to work for me. Either Athlete and Variable are already in the index, then pivot fails, or they’re not, then the groupby returns an empty dataframe.
@Cimbali you can do df = df.reset_index(), then apply above method
the .reset_index() needs to be between the .sum() and .pivot(), or you could use df.groupby(level=0,axis=1).sum().stack().unstack('Variable')

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.