0

I'd like to return the rows which has all columns > 0 or where only 2012 can be < 0.

import pandas as pd
import numpy as np

df = pd.DataFrame( {
   'A': ['d','d','d','f','f','f','g','g','g','h','h','h'],
   'B': [5,5,6,7,5,6,6,7,7,6,7,7],
   'C': [1,1,1,1,1,1,1,1,1,1,1,1],
   'S': [2012,2013,2014,2015,2016,2012,2013,2014,2015,2016,2012,2013]     
    } );

df = (df.B + df.C).groupby([df.A, df.S]).sum().unstack(fill_value=0)
print (df)

@jezrael, not exactly. I changed the dataframe to explain better. In the final result I need the rows where all columns are > 0 AND the ones where the columns are > 0, except for 2012. That one can be < 0. The result must show a new df with the columns that qualify. So, in the example below, g yes, d no.

df = pd.DataFrame( {
   'A': ['d','d','d','d','d','d','g','g','g','g','g','g'],
   'B': [5,5,6,-7,5,6,-6,7,7,6,-7,7],
   'C': [1,1,1,1,1,1,1,1,1,1,1,1],
   'S': [2012,2013,2014,2015,2016,2012,2012,2014,2015,2016,2012,2013]     
    } );

df = (df.B + df.C).groupby([df.A, df.S]).sum().unstack(fill_value=0)

S  2012  2013  2014  2015  2016
A                              
d    13     6     7    -6     6
g   -11     8     8     8     7

EDITED Dataframe;

df = pd.DataFrame( {
   'A':  ['d','d','d','d','d','d','g','g','g','g','g','g',
    'k','k','k','k','k','k'],
   'B': [5,5,6,7,5,6,-6,7,7,6,-7,7,-8,7,-6,6,-7,50],
   'C': [1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2],
   'S':   [2012,2013,2014,2015,2016,2012,2012,2014,2015,2016,2012,
        2013,2012,2013,2014,2015,2016,2014]     
    } );

df = (df.B + df.C).groupby([df.A, df.S]).sum().unstack(fill_value=0)
print (df)

S  2012  2013  2014  2015  2016
A                              
d    13     6     7     8     6
g   -11     8     8     8     7
k    -6     9     48     8    -5
3
  • 'where only 2012 can be < 0' err, what?? 2012 is greater than zero Commented Nov 20, 2016 at 15:39
  • column 2012: df = (df.B + df.C).groupby([df.A, df.S]).sum().unstack(fill_value=0) Commented Nov 20, 2016 at 15:40
  • yes and nothing is negative in that column Commented Nov 20, 2016 at 15:41

2 Answers 2

1

I think you can use double mask one for compare rows and one for columns:

df = pd.DataFrame( {
   'A': ['d','d','d','f','f','f','g','g','g','g','h','h','h', 'f'],
   'B': [5,5,6,7,5,6,-6,7,7,7,6,7,7,2],
   'C': [1,1,1,1,1,1,1,1,1,1,1,1,1,1],
   'S': [2012,2013,2014,2015,2016,2012,2012,2013,2014,2015,2016,2012,2013,2013]     
    } );

df = (df.B + df.C).groupby([df.A, df.S]).sum().unstack(fill_value=0)
print (df)
S  2012  2013  2014  2015  2016
A                              
d     6     6     7     0     0
f     7     3     0     8     6
g    -5     8     8     8     0
h     8     8     0     0     7
mask1 = df[2012] < 0
print (mask1)
A
d    False
f    False
g     True
h    False
Name: 2012, dtype: bool

mask2 = (df > 0).all()
print (mask2)
S
2012    False
2013     True
2014    False
2015    False
2016    False
dtype: bool

print (df.loc[mask1, mask2])
S  2013
A      
g     8

print (df[mask1])
S  2012  2013  2014  2015  2016
A                              
g    -5     8     8     8     0

print (df.loc[:,mask2])
S  2013
A      
d     6
f     3
g     8
h     8

EDIT by edit of question:

mask1 = df[2012] < 0
print (mask1)
A
d    False
g     True
Name: 2012, dtype: bool

mask2 = (df.drop(2012, axis=1) > 0).all(axis=1)
print (mask2)
A
d    False
g     True
dtype: bool

print (df[mask1 & mask2])
S  2012  2013  2014  2015  2016
A                              
g   -11     8     8     8     7
Sign up to request clarification or add additional context in comments.

14 Comments

mask1 brings me all rows with negative values, but also other columns which are negative. These need to be positive, so 2012 can be -, but then the others need to be +. Or all columns positve, but mask2 shows that
sorry, I dont explain it. So mask1 compare only column 2012 if less as 0 and mask2 first remove column 2012 and then compare if all values in all column are higher as 0. Then both masks are used with & to boolean indexing. Am I something missing?
nope, you're not. I missed a negative in another column and got an empty df. I checked and it works, thanks. btw, if want to add another criteria, the total of years per row need to be let's say > 20 ('g' won't show then). can I solve that with a mask?
I think then need mask3 = df.sum(axis=1) > 20 and print (df[mask1 & mask2 & mask3]) or print (df[(df[2012] < 0) & ((df.drop(2012, axis=1) > 0).all(axis=1)) & (df.sum(axis=1) > 20)]). I think you can check docs, it explain better as me.
think we're close, however, rows with all columns positive don't show
|
0

Combine the operators and use parentheses:

df[((df > 0).all(axis=1)) | (df[2012] < 0)]
Out[22]: 
Empty DataFrame
Columns: [2012, 2013, 2014, 2015, 2016]
Index: []

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.