1

I have the following dataframe:

    mba_p   ssc_p   hsc_p   degree_p etest_p    

0   58.80   67.00   91.00   58.00   55.0
1   66.28   79.33   78.33   77.48   86.5
2   57.80   65.00   68.00   64.00   75.0
3   59.43   56.00   52.00   52.00   66.0
4   55.50   85.80   73.60   73.30   96.8
... ... ... ... ... ...
210 74.49   80.60   82.00   77.60   91.0
211 53.62   58.00   60.00   72.00   74.0
212 69.72   67.00   67.00   73.00   59.0
213 60.23   74.00   66.00   58.00   70.0
214 60.22   62.00   58.00   53.00   89.0

I would like to filter the data based on this condition:

Extract all the rows where at least one value is greater than any of the values in this list:

[66.255, 75.7, 73.0, 72.0, 83.5]

tl;dr: the first row is a candidate since 67 and 91 are greater than more than one value in the list above (the values in the list are the 75th percentile of every column basically). Similarly, extract all such rows.

I tried using np.where() using the steps shown here but since that is filtering based only on one value and not a list, I can't figure how to extend that to this situation. A simple condition check would return rows only where all values are greater than equal I guess (unless I'm missing anything). Sorry if this sounds a little convoluted but not able to crack it yet.

3
  • 2
    I feel like the statement Extract all the rows where at least one value is greater than any of the values in this list basically boils down to Extract all the rows where at least one value is greater than min(arr) which is 66.255? Can you confirm if that's what you are trying to do? Commented Aug 23, 2022 at 5:57
  • I'm actually trying to filter for rows where any value is greater than 66.255 or 75.7, 73.0 or 72.0 or 83.5. Commented Aug 23, 2022 at 6:06
  • 1
    If a value is greater than 66.255 it will be greater than all 75.7, 73.0, 72.0, 83.5. So you don't really need to check if any value is greater than the whole array. You are essentially checking if any value is greater than 66.255 Commented Aug 23, 2022 at 6:25

2 Answers 2

2

You can use a mask of dataframe to get rows with values greater than the minimum in the array

num = min(66.255, 75.7, 73.0, 72.0, 83.5)
print(df.loc[df[df > num].any(axis=1)])

Output

   mba_p  ssc_p  hsc_p  degree_p  etest_p
0  58.80  67.00  91.00     58.00     55.0
1  66.28  79.33  78.33     77.48     86.5
2  57.80  65.00  68.00     64.00     75.0
4  55.50  85.80  73.60     73.30     96.8
5  74.49  80.60  82.00     77.60     91.0
6  53.62  58.00  60.00     72.00     74.0
7  69.72  67.00  67.00     73.00     59.0
8  60.23  74.00  66.00     58.00     70.0
9  60.22  62.00  58.00     53.00     89.0
Sign up to request clarification or add additional context in comments.

Comments

0
a = np.array([66.255, 75.7, 73.0, 72.0, 83.5])

df[df.apply(lambda x: (x.to_numpy() > a[:, None]).any(), axis = 1)]

     mba_p  ssc_p  hsc_p  degree_p  etest_p
0    58.80  67.00  91.00     58.00     55.0
1    66.28  79.33  78.33     77.48     86.5
2    57.80  65.00  68.00     64.00     75.0
4    55.50  85.80  73.60     73.30     96.8
210  74.49  80.60  82.00     77.60     91.0
211  53.62  58.00  60.00     72.00     74.0
212  69.72  67.00  67.00     73.00     59.0
213  60.23  74.00  66.00     58.00     70.0
214  60.22  62.00  58.00     53.00     89.0

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.