4

Question about selecting rows by MultiIndex.

df:

              colA   ColB

   A1   B1   1  ca1  cb1
             2  ca2  cb2
        B2   1  ca3  cb3
             2  ca4  cb4 
   A2   B1   1  ca5  cb5
             2  ca6  cb6
        B2   1  ca7  cb7
             2  ca8  cb8

I want to select colA from A2/B1 where MultiIndex levels(2) > 1. I'm doing it in multiple steps:

df1=df.loc[A2,[B7],:,]  

for some reason I have to present level(1) in brackets

df1.index = x1.index.droplevel(0)

df1.index = x1.index.droplevel(0)

df1.loc[1:,'colA']

I know it is way too much but I cant figure how to do it simpler. Ways I can think of don't want to work.

3
  • Possible duplicate of How to query MultiIndex index columns values in pandas Commented Jan 6, 2018 at 18:58
  • ^ index in the dupe above is a named index, so it doesn't apply here. Commented Jan 6, 2018 at 19:05
  • @Ranny, please post some reproducible data next time. You can do it easily with df.to_dict(). Thanks! Commented Jan 6, 2018 at 19:30

2 Answers 2

6

Since you're working with unnamed index columns, one way is using query on ilevel_* -

df.query("ilevel_0 == 'A2' and ilevel_1 == 'B1' and ilevel_2 > 1")

        colA ColB
A2 B1 2  ca6  cb6

Another method is using index.get_level_values -

f = df.index.get_level_values
df.loc[(f(0) == 'A2') & (f(1) == 'B1') & (f(2) > 1)]

        colA ColB
A2 B1 2  ca6  cb6
Sign up to request clarification or add additional context in comments.

5 Comments

Before I try. I know that query is little slow. What will change if I add names to index levels?
@Ranny Not much, actually... just the string expression. If you don't want query, loc should be pretty fast.
Renaming df.index.get_level_values to f is pretty cool! +1
Thanks for all help. I will go with 'f' method, but will keep other two in arsenal. I just have to get rid of first two levels of MultiIndex. I think reindex will do it just fine. Thanks again.
@Ranny You're welcome. Don't forget to vote on our answers, and accept the most helpful one.
5

You can also do it in two operations. First use loc to select parts in A2 and B1, and then do another query at the remaining index, by setting query('index > 1').

df = pd.DataFrame({'ColB': {('A1', 'B1', 1): 'cb1',
  ('A1', 'B1', 2): 'cb2',
  ('A1', 'B2', 1): 'cb3',
  ('A1', 'B2', 2): 'cb4',
  ('A2', 'B1', 1): 'cb5',
  ('A2', 'B1', 2): 'cb6',
  ('A2', 'B2', 1): 'cb7',
  ('A2', 'B2', 2): 'cb8'},
 'colA': {('A1', 'B1', 1): 'ca1',
  ('A1', 'B1', 2): 'ca2',
  ('A1', 'B2', 1): 'ca3',
  ('A1', 'B2', 2): 'ca4',
  ('A2', 'B1', 1): 'ca5',
  ('A2', 'B1', 2): 'ca6',
  ('A2', 'B2', 1): 'ca7',
  ('A2', 'B2', 2): 'ca8'}})

# Answer here
df.loc["A2", "B1"].query("index > 1")
Output:
    ColB    colA
 2  cb6     ca6

5 Comments

I like this one (because I understand it better). It works, just I need to sort index before. The only thing I cant pass variable inside quotation marks for querry. How do do this? a=1; "index > a"
@Ranny .query("index > @a")
@Ranny mixing loc and query isn't as clean as using either though. :-)
Thanks Tai. Actually I will go with your solution, there is quite big difference in %timeit. 3.6 vs 71.6 (without f assignment). And I'll remember about .to_dict() next time.
@Ranny Cool. Thanks for letting me know about your experiments.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.