5

Let's say I have the following multi-index DataFrame:

import pandas as pd
df = pd.DataFrame({'Index0':[0,1,2,3,4,5],'Index1':[100,200,300,400,500,600],'A':[5,2,5,8,1,2]})

example DataFrame

Now I want to select all the rows where Index1 is less than 400. Everybody knows how that works if Index1 was a regular column:

df[df['Index1'] < 400]

So one method would be to reset_index, perform the selection, then set the index again. This seems quite redundant.

My question is: Is there a way to do this directly? And how to do this when the DataFrame has a row multiindex?

2
  • Oops. Forgot df.set_index(['Index0','Index1']) in the code. Commented Jun 10, 2018 at 15:58
  • Ahem, that should have been `df.set_index(['Index0','Index1'],inplace=True) Commented Jun 10, 2018 at 16:08

1 Answer 1

7

Simpliest here is use query:

df1 = df.query('Index1 < 400')
print (df1)
               A
Index0 Index1   
0      100     5
1      200     2
2      300     5

Or get_level_values for select level of MultiIndex with boolean indexing:

df1 = df[df.index.get_level_values('Index1') < 400]

Detail:

print (df.index.get_level_values('Index1'))
Int64Index([100, 200, 300, 400, 500, 600], dtype='int64', name='Index1')

If levels have no names select by positions, for query use special keyword ilevel_ with position:

df.index.names = [None, None]
print (df)
       A
0 100  5
1 200  2
2 300  5
3 400  8
4 500  1
5 600  2

df1 = df.query('ilevel_1 < 400')

df1 = df[df.index.get_level_values(1) < 400]
print (df1)
       A
0 100  5
1 200  2
2 300  5
Sign up to request clarification or add additional context in comments.

1 Comment

Many thanks @jezrael for prompt and complete answer. I can see now why query is convenient. I should have known the second method.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.