2

NB: A question similar to this has been asked before but it doesn't exactly answer my question.

How do I subset a pandas dataframe, with many columns, based on certain large number of columns satisfying some boolean condition.

Right now, I'd have to do something like:

df[(df.column4 > a1) | (df.column23 < a2) | (df.column27 == a3) | ... 
    (df.column56 > a21) | (df.column72 < a22)]

Thanks

10
  • Well is there any sort of regularity to your conditions? Commented Jan 20, 2017 at 23:25
  • @Lagerbaer No. But supposing all the conditions were '>'. Is there a way for that even? I can't applymap a lambda function on columns and get only the required rows. Commented Jan 20, 2017 at 23:29
  • Well if you'd have a list of tuples (column name and condition) then you could iterate over that list and incrementally build up the condition to be used within the filter. Commented Jan 20, 2017 at 23:30
  • @Lagerbaer Right. But that's expensive. Commented Jan 20, 2017 at 23:33
  • If your condition was the same then you pass a list of cols and compare the entire thing: df[list_of_cols] > some_val gives you a mask you can then use this mask on the original df: df[df[list_of_cols] > some_val] note that it must be a real list: df[['col1','col2'...]] and not df['col1','col2',...] because the former is a list of column labels, the latter will be treated as a tuple and will raise a KeyError because it'll try to find a column named 'col1','col2' Commented Jan 20, 2017 at 23:36

1 Answer 1

0

You'll have to specify your conditions one way or another. You can create individual masks for each condition which you eventually reduce to a single one:

import seaborn.apionly as sns
import operator
import numpy as np

# Load a sample dataframe to play with
df = sns.load_dataset('iris')

# Define individual conditions as tuples
# ([column], [compare_function], [compare_value])
cond1 = ('sepal_length', operator.gt, 5)
cond2 = ('sepal_width', operator.lt, 2)
cond3 = ('species', operator.eq, 'virginica')
conditions = [cond1, cond2, cond3]

# Apply those conditions on the df, creating a list of 3 masks
masks = [fn(df[var], val) for var, fn, val in conditions]
# Reduce those 3 masks to one using logical OR
mask = np.logical_or.reduce(masks)

result = df.ix[mask]

When we compare this with the "hand-made" selection, we see they're the same:

result_manual = df[(df.sepal_length>5) | (df.sepal_width<2) | (df.species == 'virginica')]
result_manual.equals(result) # == True
Sign up to request clarification or add additional context in comments.

1 Comment

I learnt a couple of things from this answer: using reduce to create masks. Don't know why I never thought of that. Thanks.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.