select rows based on certain conditions with pandas

Question

I'd like to return the rows which has all columns > 0 or where only 2012 can be < 0.

import pandas as pd
import numpy as np

df = pd.DataFrame( {
   'A': ['d','d','d','f','f','f','g','g','g','h','h','h'],
   'B': [5,5,6,7,5,6,6,7,7,6,7,7],
   'C': [1,1,1,1,1,1,1,1,1,1,1,1],
   'S': [2012,2013,2014,2015,2016,2012,2013,2014,2015,2016,2012,2013]     
    } );

df = (df.B + df.C).groupby([df.A, df.S]).sum().unstack(fill_value=0)
print (df)

@jezrael, not exactly. I changed the dataframe to explain better. In the final result I need the rows where all columns are > 0 AND the ones where the columns are > 0, except for 2012. That one can be < 0. The result must show a new df with the columns that qualify. So, in the example below, g yes, d no.

df = pd.DataFrame( {
   'A': ['d','d','d','d','d','d','g','g','g','g','g','g'],
   'B': [5,5,6,-7,5,6,-6,7,7,6,-7,7],
   'C': [1,1,1,1,1,1,1,1,1,1,1,1],
   'S': [2012,2013,2014,2015,2016,2012,2012,2014,2015,2016,2012,2013]     
    } );

df = (df.B + df.C).groupby([df.A, df.S]).sum().unstack(fill_value=0)

S  2012  2013  2014  2015  2016
A                              
d    13     6     7    -6     6
g   -11     8     8     8     7

EDITED Dataframe;

df = pd.DataFrame( {
   'A':  ['d','d','d','d','d','d','g','g','g','g','g','g',
    'k','k','k','k','k','k'],
   'B': [5,5,6,7,5,6,-6,7,7,6,-7,7,-8,7,-6,6,-7,50],
   'C': [1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2],
   'S':   [2012,2013,2014,2015,2016,2012,2012,2014,2015,2016,2012,
        2013,2012,2013,2014,2015,2016,2014]     
    } );

df = (df.B + df.C).groupby([df.A, df.S]).sum().unstack(fill_value=0)
print (df)

S  2012  2013  2014  2015  2016
A                              
d    13     6     7     8     6
g   -11     8     8     8     7
k    -6     9     48     8    -5

'where only 2012 can be < 0' err, what?? 2012 is greater than zero — ForceBru
– ForceBru, Commented Nov 20, 2016 at 15:39
column 2012: df = (df.B + df.C).groupby([df.A, df.S]).sum().unstack(fill_value=0) — Zanshin
– Zanshin, Commented Nov 20, 2016 at 15:40

jezrael · Accepted Answer · 2016-11-20 16:41:27Z

I think you can use double mask one for compare rows and one for columns:

df = pd.DataFrame( {
   'A': ['d','d','d','f','f','f','g','g','g','g','h','h','h', 'f'],
   'B': [5,5,6,7,5,6,-6,7,7,7,6,7,7,2],
   'C': [1,1,1,1,1,1,1,1,1,1,1,1,1,1],
   'S': [2012,2013,2014,2015,2016,2012,2012,2013,2014,2015,2016,2012,2013,2013]     
    } );

df = (df.B + df.C).groupby([df.A, df.S]).sum().unstack(fill_value=0)
print (df)
S  2012  2013  2014  2015  2016
A                              
d     6     6     7     0     0
f     7     3     0     8     6
g    -5     8     8     8     0
h     8     8     0     0     7

mask1 = df[2012] < 0
print (mask1)
A
d    False
f    False
g     True
h    False
Name: 2012, dtype: bool

mask2 = (df > 0).all()
print (mask2)
S
2012    False
2013     True
2014    False
2015    False
2016    False
dtype: bool

print (df.loc[mask1, mask2])
S  2013
A      
g     8

print (df[mask1])
S  2012  2013  2014  2015  2016
A                              
g    -5     8     8     8     0

print (df.loc[:,mask2])
S  2013
A      
d     6
f     3
g     8
h     8

EDIT by edit of question:

mask1 = df[2012] < 0
print (mask1)
A
d    False
g     True
Name: 2012, dtype: bool

mask2 = (df.drop(2012, axis=1) > 0).all(axis=1)
print (mask2)
A
d    False
g     True
dtype: bool

print (df[mask1 & mask2])
S  2012  2013  2014  2015  2016
A                              
g   -11     8     8     8     7

mask1 brings me all rows with negative values, but also other columns which are negative. These need to be positive, so 2012 can be -, but then the others need to be +. Or all columns positve, but mask2 shows that
sorry, I dont explain it. So mask1 compare only column 2012 if less as 0 and mask2 first remove column 2012 and then compare if all values in all column are higher as 0. Then both masks are used with & to boolean indexing. Am I something missing?
nope, you're not. I missed a negative in another column and got an empty df. I checked and it works, thanks. btw, if want to add another criteria, the total of years per row need to be let's say > 20 ('g' won't show then). can I solve that with a mask?
I think then need mask3 = df.sum(axis=1) > 20 and print (df[mask1 & mask2 & mask3]) or print (df[(df[2012] < 0) & ((df.drop(2012, axis=1) > 0).all(axis=1)) & (df.sum(axis=1) > 20)]). I think you can check docs, it explain better as me.
think we're close, however, rows with all columns positive don't show

Zeugma · Accepted Answer · 2016-11-20 15:45:51Z

0

Combine the operators and use parentheses:

df[((df > 0).all(axis=1)) | (df[2012] < 0)]
Out[22]: 
Empty DataFrame
Columns: [2012, 2013, 2014, 2015, 2016]
Index: []

answered Nov 20, 2016 at 15:45

Zeugma

32.3k9 gold badges73 silver badges85 bronze badges

Collectives™ on Stack Overflow

select rows based on certain conditions with pandas

2 Answers 2

14 Comments

Comments

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

14 Comments

Comments

Related