44

despite there being at least two good tutorials on how to index a DataFrame in Python's pandas library, I still can't work out an elegant way of SELECTing on more than one column.

>>> d = pd.DataFrame({'x':[1, 2, 3, 4, 5], 'y':[4, 5, 6, 7, 8]})
>>> d
   x  y
0  1  4
1  2  5
2  3  6
3  4  7
4  5  8
>>> d[d['x']>2] # This works fine
   x  y
2  3  6
3  4  7
4  5  8
>>> d[d['x']>2 & d['y']>7] # I had expected this to work, but it doesn't
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

I have found (what I think is) a rather inelegant way of doing it, like this

>>> d[d['x']>2][d['y']>7]

But it's not pretty, and it scores fairly low for readability (I think).

Is there a better, more Python-tastic way?

0

2 Answers 2

93

It is a precedence operator issue.

You should add extra parenthesis to make your multi condition test working:

d[(d['x']>2) & (d['y']>7)]

This section of the tutorial you mentioned shows an example with several boolean conditions and the parenthesis are used.

Sign up to request clarification or add additional context in comments.

1 Comment

note: the unary operator & is required here. d[(d['x']>2) and (d['y']>7)] fails with ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
2

There may still be a better way, but

In [56]: d[d['x'] > 2] and d[d['y'] > 7]
Out[56]: 
   x  y
4  5  8

works.

3 Comments

this works, but ends up using python operators (rather than numpy) and so is going to be much slower
that's a nice solution. I like the fact that it explicitly uses and. Makes it clearer that there are two conditions being evaluated.
Oh, I've just found a duplicate of this question. Whoops.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.