Pivot a pandas dataframe with multiple columns

Question

I have a sample data frame like below

df1 = pd.DataFrame({'Gender':['Male','Male','Male','Male','Female','Female','Female','Female','Male','Male','Male','Male','Female','Female','Female','Female'],
                'Year' :[2008,2008,2009,2009,2008,2008,2009,2009,2008,2008,2009,2009,2008,2008,2009,2009],
           'rate':[2.3,3.2,4.5,6.7,5.6,3.2,3.5,2.6,2.3,3.2,4.5,6.7,5.6,3.2,3.5,2.6],
           'Heading':['TNMAB123','TNMAB123','TNMAB123','TNMAB123','TNMAB123','TNMAB123','TNMAB123','TNMAB123',
                     'TNMAB456','TNMAB456','TNMAB456','TNMAB456','TNMAB456','TNMAB456','TNMAB456','TNMAB456'],
           'target':[31.2,33.4,33.4,35.2,35.2,36.4,36.4,37.2,31.2,33.4,33.4,35.2,35.2,36.4,36.4,37.2],
            'day_type':['wk','wkend','wk','wkend','wk','wkend','wk','wkend','wk','wkend','wk','wkend','wk','wkend','wk','wkend']})

I would like to transpose/pivot them to get the output like as shown below but for my code, it throws an error as shown below

df1.pivot(index='Year', columns='Heading', values='rate')

With the help of SO post, I wrote this but for 3 columns, I am not sure how to make it work?

df1 = df1.pivot_table(index=['Year','Gender','day_type'],columns='Heading',values='rate').unstack()
df1.columns = ['_'.join(i) for i in df1.columns.tolist()]

I expect my output to be like as shown below where each year is made as a row and all the corresponding entries for that year are made as columns.

Please note I haven't filled in the values as table column structure is more important.

BENY · Accepted Answer · 2020-05-30 15:15:55Z

2

Try with map, also you need unstack two level

df1 = df1.pivot_table(index=['Year','Gender','day_type'],columns='Heading',values='rate').unstack([1,2])
df1.columns=df1.columns.map('_'.join)
df1
      TNMAB123_Female_wk  ...  TNMAB456_Male_wkend
Year                      ...                     
2008                 5.6  ...                  3.2
2009                 3.5  ...                  6.7
[2 rows x 8 columns]

answered May 30, 2020 at 15:15

BENY

324k22 gold badges176 silver badges250 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

The Great Over a year ago

Fantastic. It works. It works. But only issue is few of my columns repeat twice in my data? Meaning there is an extra entry of the same column (but empty/blank values). Upvoted

BENY Over a year ago

df1.pivot_table(index=['Year','Gender','day_type'],columns='Heading',values='rate'，aggfunc='sum') @TheGreat

The Great Over a year ago

May I know why do we need to do aggfunc=sum?

BENY Over a year ago

@TheGreat pivot is to reshape, with aggfunc, it will make the duplicate agg with sum, or you can do df1=df1.sum(level=0,axis=1)

The Great Over a year ago

No, the issue right now is for ex: I have two columns. readata123_Female_wk and realdata123 _Female_wk. Note the gap in between hyphen. The former has proper values whereas the latter (incorrect space issue) column is empty

|

Collectives™ on Stack Overflow

Pivot a pandas dataframe with multiple columns

1 Answer 1

6 Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

6 Comments

Related