How to retrieve cells from a dataframe based on condition from another dataframe

Question

We have two dataframes, first one contains some float values (which mean average speed).

                 0          1          2      
1           15.610826  19.182879   6.678087  
2           13.740250  15.666897   17.640749
3           2.379010   2.889702    2.955097 
4           20.540628   9.661226   9.479921

And another dataframe with geographical coordinates, where the average speed takes place.

                  0                                  1                              2
1         [52.2399255, 21.0654495]           [52.23893150000001, 21.06087]    [52.23800850000001,21.056779]
2         [52.2449705, 21.0755175]           [52.2452905, 21.075118000000003]   [52.245557500000004, 21.0748175]
3         [52.2401885, 21.012981500000002]   [52.239134, 21.009432]             [52.238420500000004, 21.007080000000002]
4         [52.221506500000004, 20.9665085]   [52.222458, 20.968952]             [52.224409, 20.969248999999998]

Now I want to create a list with coordinates where average speed is above 18, in this case this would be

list_above_18=[[52.23893150000001, 21.06087] , [52.221506500000004, 20.9665085]]

How can I select values from a dataframe based on values in another dataframe?

There is no way for me to test anything here since you haven't provided enough code, but you can try creating masks like so; mask = df1.loc[:, 0] > 18, and then using them to filter your dataframes like so; df2.loc[mask, 0]. This is just a mask for the 0th column. — armara
– armara, Commented Dec 25, 2020 at 18:51
You should avoid loop especially in case of pandas and numpy. Vecotorise wherever possible as it would be faster — ggaurav
– ggaurav, Commented Dec 25, 2020 at 19:43

IoaTzimas · Accepted Answer · 2020-12-25 19:08:36Z

1

You can use enumerate to zip the dataframes and work on the elements seperately. See below (A,B are your dataframes, in same order you provided them):

list_above_18=[]
p=list(enumerate(zip(A.values, B.values)))

for i in p:
    for k in range(3):
        if i[1][0][k]>18:
            list_above_18.append(i[1][1][k])

Output:

>>>print(list_above_18)

[[52.23893150000001, 21.06087] , [52.221506500000004, 20.9665085]]

edited Dec 25, 2020 at 19:08

answered Dec 25, 2020 at 19:01

IoaTzimas

10.6k2 gold badges15 silver badges32 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Ankit Singh · Accepted Answer · 2020-12-25 19:05:37Z

Considering the shape of the Average Speed dataset will remain same as the coordinates dataset, you can try the below

coord_df[data_df.iloc[:,:] > 18].T.stack().values

Here, coord_df = DataFrame with coordinate values data_df = Average Speed values

This would return a numpy array with just the coordinate values where the Average speed is greater than 18

How this works :

data_df.iloc[:,:] > 18

Creates a dataframe mask such that all the values which are smaller than 18 are marked as False and rest as True

coord_df[data_df.iloc[:,:] > 18]

Passes the mask in the Target Dataframe i.e. coordinate dataframe which then results in a dataframe which shows coordinate values only for those cells where the mask has True i.e. where the average speed was above 18

.T.stack().values

This then retrieves only the non-null values from the resultant dataframe and returns a numpy array

References I took :

Get non-null elements in a pandas DataFrame --- To get only the non null values from a dataframe (.T.stack().values)

ggaurav · Accepted Answer · 2020-12-25 19:31:08Z

1

Let the first df be df1 and second df be df2

output_array = df2[df1>18].values.flatten() # df1>18 would create the mask
output_array = [val for val in output_array if type(val) == list] # removing the nan values. We can't use np.isnan as it would not work for list

Sample Input:

df1

df2

output_array

[[15.1, 20.5], [91.5, 95.8]]

answered Dec 25, 2020 at 19:31

ggaurav

1,8041 gold badge11 silver badges11 bronze badges

Collectives™ on Stack Overflow

How to retrieve cells from a dataframe based on condition from another dataframe

3 Answers 3

Comments

Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Linked

Related