Pandas - Selecting over multiple columns

Question

NB: A question similar to this has been asked before but it doesn't exactly answer my question.

How do I subset a pandas dataframe, with many columns, based on certain large number of columns satisfying some boolean condition.

Right now, I'd have to do something like:

df[(df.column4 > a1) | (df.column23 < a2) | (df.column27 == a3) | ... 
    (df.column56 > a21) | (df.column72 < a22)]

Thanks

@Lagerbaer No. But supposing all the conditions were '>'. Is there a way for that even? I can't applymap a lambda function on columns and get only the required rows. — sntx
– sntx, Commented Jan 20, 2017 at 23:29
Well if you'd have a list of tuples (column name and condition) then you could iterate over that list and incrementally build up the condition to be used within the filter. — cadolphs
– cadolphs, Commented Jan 20, 2017 at 23:30
If your condition was the same then you pass a list of cols and compare the entire thing: df[list_of_cols] > some_val gives you a mask you can then use this mask on the original df: df[df[list_of_cols] > some_val] note that it must be a real list: df[['col1','col2'...]] and not df['col1','col2',...] because the former is a list of column labels, the latter will be treated as a tuple and will raise a KeyError because it'll try to find a column named 'col1','col2' — EdChum
– EdChum, Commented Jan 20, 2017 at 23:36

TomTom101 · Accepted Answer · 2017-01-21 00:00:42Z

You'll have to specify your conditions one way or another. You can create individual masks for each condition which you eventually reduce to a single one:

import seaborn.apionly as sns
import operator
import numpy as np

# Load a sample dataframe to play with
df = sns.load_dataset('iris')

# Define individual conditions as tuples
# ([column], [compare_function], [compare_value])
cond1 = ('sepal_length', operator.gt, 5)
cond2 = ('sepal_width', operator.lt, 2)
cond3 = ('species', operator.eq, 'virginica')
conditions = [cond1, cond2, cond3]

# Apply those conditions on the df, creating a list of 3 masks
masks = [fn(df[var], val) for var, fn, val in conditions]
# Reduce those 3 masks to one using logical OR
mask = np.logical_or.reduce(masks)

result = df.ix[mask]

When we compare this with the "hand-made" selection, we see they're the same:

result_manual = df[(df.sepal_length>5) | (df.sepal_width<2) | (df.species == 'virginica')]
result_manual.equals(result) # == True

I learnt a couple of things from this answer: using reduce to create masks. Don't know why I never thought of that. Thanks.

Collectives™ on Stack Overflow

Pandas - Selecting over multiple columns

1 Answer 1

1 Comment

Hot Network Questions