3

Assume you have a Panda DataFrame with a MultiIndex. You want to get all the rows that have a label with a particular value. How do you do this?

My first thought was a boolean mask...

df[df.index.labels == 1].head()

but this does not work.

Thanks!

6
  • You can convert index back to columns and then filter. It certainly works with one index. It should work with multiindex but I am not sure. Commented Jul 23, 2016 at 22:19
  • 2
    Show the dataframe your working with. Commented Jul 23, 2016 at 23:20
  • Why the downvote? Is this clearly documented somewhere? Is it unclear? Is it not helpful? It would have helped me obviously meta.stackoverflow.com/questions/252677/… Commented Jul 23, 2016 at 23:45
  • 3
    I wasn't the one who down voted, nor do I know who did. But I can say that I've seen this question many times and has been answered many times. Try stackoverflow.com/search?q=pandas+filter+rows. Someone probably didn't think you put enough effort into the research. If you hover over the down vote button, it says "this question doesn't show any research effort". Hope that helps. Commented Jul 24, 2016 at 0:57
  • 1
    You don't have a sample dataframe to work on. Commented Jul 24, 2016 at 9:21

3 Answers 3

3

I would use xs (cross-section):

In [11]: df = pd.DataFrame([[1, 2, 3], [3, 4, 5]], columns=list("ABC")).set_index(["A", "B"])

In [12]: df
Out[12]:
     C
A B
1 2  3
3 4  5

then you can take those which have level A equal to 1:

In [13]: df.xs(key=1, level="A")
Out[13]:
   C
B
2  3

Using drop_level=False does the filter (without dropping the A index):

In [14]: df.xs(key=1, level="A", drop_level=False)
Out[14]:
     C
A B
1 2  3
Sign up to request clarification or add additional context in comments.

Comments

2

You need to specify which index you use. In my example I took the second index (My dataframe is s because it was so in Multiindex page of Pandas):

s[s.index.labels[1]==1]

You can actually see how index is constructed if you type:

s.index

The resulting structure is:

MultiIndex(levels=[['bar', 'baz', 'foo', 'qux'], [1, 2]],
       labels=[[0, 0, 1, 1, 2, 2, 3, 3], [0, 1, 0, 1, 0, 1, 0, 1]],
       names=['first', 'second'])

Below I have the full code:

>>> import pandas as pd
>>> import numpy as np
>>> arrays = [['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'],
...           [1, 2, 1, 2, 1, 2, 1, 2]]
... 
>>> tuples = list(zip(*arrays))
>>> index = pd.MultiIndex.from_tuples(tuples, names=['first', 'second'])
>>> s = pd.Series(np.random.randn(8), index=index)
>>> s[s.index.labels[1]==1]
first  second
bar    2        -0.304029
baz    2        -1.216370
foo    2         1.401905
qux    2        -0.411468
dtype: float64

Comments

1

alternative solution:

In [62]: df = pd.DataFrame({'idx1': ['A','B','C'], 'idx2':[1,2,3], 'val': [30,10,20]}).set_index(['idx1','idx2'])

In [63]: df
Out[63]:
           val
idx1 idx2
A    1      30
B    2      10
C    3      20

In [64]: df[df.index.get_level_values('idx2') == 2]
Out[64]:
           val
idx1 idx2
B    2      10

In [65]: df[df.index.get_level_values(1) == 2]
Out[65]:
           val
idx1 idx2
B    2      10

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.