Extract rows from pandas dataframe with at least one value greater than or equal to values from array

Question

I have the following dataframe:

    mba_p   ssc_p   hsc_p   degree_p etest_p    

0   58.80   67.00   91.00   58.00   55.0
1   66.28   79.33   78.33   77.48   86.5
2   57.80   65.00   68.00   64.00   75.0
3   59.43   56.00   52.00   52.00   66.0
4   55.50   85.80   73.60   73.30   96.8
... ... ... ... ... ...
210 74.49   80.60   82.00   77.60   91.0
211 53.62   58.00   60.00   72.00   74.0
212 69.72   67.00   67.00   73.00   59.0
213 60.23   74.00   66.00   58.00   70.0
214 60.22   62.00   58.00   53.00   89.0

I would like to filter the data based on this condition:

Extract all the rows where at least one value is greater than any of the values in this list:

[66.255, 75.7, 73.0, 72.0, 83.5]

tl;dr: the first row is a candidate since 67 and 91 are greater than more than one value in the list above (the values in the list are the 75th percentile of every column basically). Similarly, extract all such rows.

I tried using np.where() using the steps shown here but since that is filtering based only on one value and not a list, I can't figure how to extend that to this situation. A simple condition check would return rows only where all values are greater than equal I guess (unless I'm missing anything). Sorry if this sounds a little convoluted but not able to crack it yet.

I feel like the statement Extract all the rows where at least one value is greater than any of the values in this list basically boils down to Extract all the rows where at least one value is greater than min(arr) which is 66.255? Can you confirm if that's what you are trying to do? — Equinox
– Equinox, Commented Aug 23, 2022 at 5:57
I'm actually trying to filter for rows where any value is greater than 66.255 or 75.7, 73.0 or 72.0 or 83.5. — Cat_Dog
– Cat_Dog, Commented Aug 23, 2022 at 6:06
If a value is greater than 66.255 it will be greater than all 75.7, 73.0, 72.0, 83.5. So you don't really need to check if any value is greater than the whole array. You are essentially checking if any value is greater than 66.255 — Equinox
– Equinox, Commented Aug 23, 2022 at 6:25

Guy · Accepted Answer · 2022-08-23 06:12:50Z

You can use a mask of dataframe to get rows with values greater than the minimum in the array

num = min(66.255, 75.7, 73.0, 72.0, 83.5)
print(df.loc[df[df > num].any(axis=1)])

Output

   mba_p  ssc_p  hsc_p  degree_p  etest_p
0  58.80  67.00  91.00     58.00     55.0
1  66.28  79.33  78.33     77.48     86.5
2  57.80  65.00  68.00     64.00     75.0
4  55.50  85.80  73.60     73.30     96.8
5  74.49  80.60  82.00     77.60     91.0
6  53.62  58.00  60.00     72.00     74.0
7  69.72  67.00  67.00     73.00     59.0
8  60.23  74.00  66.00     58.00     70.0
9  60.22  62.00  58.00     53.00     89.0

Onyambu · Accepted Answer · 2022-08-23 06:03:40Z

a = np.array([66.255, 75.7, 73.0, 72.0, 83.5])

df[df.apply(lambda x: (x.to_numpy() > a[:, None]).any(), axis = 1)]

     mba_p  ssc_p  hsc_p  degree_p  etest_p
0    58.80  67.00  91.00     58.00     55.0
1    66.28  79.33  78.33     77.48     86.5
2    57.80  65.00  68.00     64.00     75.0
4    55.50  85.80  73.60     73.30     96.8
210  74.49  80.60  82.00     77.60     91.0
211  53.62  58.00  60.00     72.00     74.0
212  69.72  67.00  67.00     73.00     59.0
213  60.23  74.00  66.00     58.00     70.0
214  60.22  62.00  58.00     53.00     89.0

Collectives™ on Stack Overflow

Extract rows from pandas dataframe with at least one value greater than or equal to values from array

2 Answers 2

Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Linked

Related