Sort by both index and value in Multi-indexed data of Pandas dataframe

Question

Suppose, I have a dataframe as below:

    year    month   message
0   2018    2   txt1
1   2017    4   txt2
2   2019    5   txt3
3   2017    5   txt5
4   2017    5   txt4
5   2020    4   txt3
6   2020    6   txt3
7   2020    6   txt3
8   2020    6   txt4

I want to figure out top three number of messages in each year. So, I grouped the data as below:

df.groupby(['year','month']).count()

which results:

            message
year    month   
2017    4   1
        5   2
2018    2   1
2019    5   1
2020    4   1
        6   3

The data is in ascending order for both indexes. But how to find the results as shown below where the data is sorted by year (ascending) and count (descending) for top n values. 'month' index will be free.

            message
year    month   
2017    5   2
        4   1
2018    2   1
2019    5   1
2020    6   3
        4   1

Quang Hoang · Accepted Answer · 2020-03-09 12:55:53Z

2

value_counts gives you sort by default:

df.groupby('year')['month'].value_counts()

Output:

year  month
2017  5        2
      4        1
2018  2        1
2019  5        1
2020  6        3
      4        1
Name: month, dtype: int64

If you want only 2 top values for each year, do another groupby:

(df.groupby('year')['month'].value_counts()
   .groupby('year').head(2)
)

Output:

year  month
2017  5        2
      4        1
2018  2        1
2019  5        1
2020  6        3
      4        1
Name: month, dtype: int64

answered Mar 9, 2020 at 12:55

Quang Hoang

151k11 gold badges63 silver badges86 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

taminur Over a year ago

Thank you very much. This is what i am looking for.

anky Over a year ago

we can also chain head and value_counts with apply: df.groupby('year')['month'].apply(lambda x: x.value_counts().head(2))

ywbaek · Accepted Answer · 2020-03-09 12:41:46Z

2

This will sort by year (ascending) and count (descending).

df = df.groupby(['year', 'month']).count().sort_values(['year', 'message'], ascending=[True, False])

answered Mar 9, 2020 at 12:41

ywbaek

3,0413 gold badges11 silver badges28 bronze badges

2 Comments

taminur Over a year ago

Thanks, it seems working. Actually, i have another part, How can i limit my result for top 2 values for each year?

ywbaek Over a year ago

You can group the df again by 'year' and apply head(n), where n would be the number of rows you want to return for each year.df = df.groupby('year').head(2)

yatu · Accepted Answer · 2020-03-09 11:59:31Z

1

You can use sort_index, specifying ascending=[True,False] so that only the second level is sorted in descending order:

df = df.groupby(['year','month']).count().sort_index(ascending=[True,False])

              message
year month         
2017 5            2
     4            1
2018 2            1
2019 5            1
2020 6            3
     4            1

answered Mar 9, 2020 at 11:59

yatu

88.6k12 gold badges93 silver badges148 bronze badges

2 Comments

ywbaek Over a year ago

This won't sort "count" in descending order.

taminur Over a year ago

@YoungWookBa you are right. Unfortunately, Its not working.

yazdanimehdi · Accepted Answer · 2020-03-09 12:40:23Z

1

here you go

df.groupby(['year', 'month']).count().sort_values(axis=0, ascending=False, by='message').sort_values(axis=0, ascending=True, by='year')

answered Mar 9, 2020 at 12:40

yazdanimehdi

964 bronze badges

1 Comment

taminur Over a year ago

Thank you very much, It seems working. How can i limit my result for say top 2 values for each year?

yazdanimehdi · Accepted Answer · 2020-03-09 12:11:14Z

0

you can use this code for it.

df.groupby(['year', 'month']).count().sort_index(axis=0, ascending=False).sort_values(by="year", ascending=True)

answered Mar 9, 2020 at 12:11

yazdanimehdi

964 bronze badges

1 Comment

taminur Over a year ago

Tried. Its not sorting 'count' in descending order.

Collectives™ on Stack Overflow

Sort by both index and value in Multi-indexed data of Pandas dataframe

5 Answers 5

2 Comments

2 Comments

2 Comments

1 Comment

1 Comment

Linked

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

2 Comments

2 Comments

2 Comments

1 Comment

1 Comment

Linked

Related