0

We have two dataframes, first one contains some float values (which mean average speed).

                 0          1          2      
1           15.610826  19.182879   6.678087  
2           13.740250  15.666897   17.640749
3           2.379010   2.889702    2.955097 
4           20.540628   9.661226   9.479921  

And another dataframe with geographical coordinates, where the average speed takes place.

                  0                                  1                              2
1         [52.2399255, 21.0654495]           [52.23893150000001, 21.06087]    [52.23800850000001,21.056779]
2         [52.2449705, 21.0755175]           [52.2452905, 21.075118000000003]   [52.245557500000004, 21.0748175]
3         [52.2401885, 21.012981500000002]   [52.239134, 21.009432]             [52.238420500000004, 21.007080000000002]
4         [52.221506500000004, 20.9665085]   [52.222458, 20.968952]             [52.224409, 20.969248999999998]

Now I want to create a list with coordinates where average speed is above 18, in this case this would be

list_above_18=[[52.23893150000001, 21.06087] , [52.221506500000004, 20.9665085]]

How can I select values from a dataframe based on values in another dataframe?

2
  • There is no way for me to test anything here since you haven't provided enough code, but you can try creating masks like so; mask = df1.loc[:, 0] > 18, and then using them to filter your dataframes like so; df2.loc[mask, 0]. This is just a mask for the 0th column. Commented Dec 25, 2020 at 18:51
  • You should avoid loop especially in case of pandas and numpy. Vecotorise wherever possible as it would be faster Commented Dec 25, 2020 at 19:43

3 Answers 3

1

You can use enumerate to zip the dataframes and work on the elements seperately. See below (A,B are your dataframes, in same order you provided them):

list_above_18=[]
p=list(enumerate(zip(A.values, B.values)))

for i in p:
    for k in range(3):
        if i[1][0][k]>18:
            list_above_18.append(i[1][1][k])

Output:

>>>print(list_above_18)

[[52.23893150000001, 21.06087] , [52.221506500000004, 20.9665085]]
Sign up to request clarification or add additional context in comments.

Comments

1

Considering the shape of the Average Speed dataset will remain same as the coordinates dataset, you can try the below

coord_df[data_df.iloc[:,:] > 18].T.stack().values

Here, coord_df = DataFrame with coordinate values data_df = Average Speed values

This would return a numpy array with just the coordinate values where the Average speed is greater than 18

How this works :

data_df.iloc[:,:] > 18

Creates a dataframe mask such that all the values which are smaller than 18 are marked as False and rest as True

coord_df[data_df.iloc[:,:] > 18]

Passes the mask in the Target Dataframe i.e. coordinate dataframe which then results in a dataframe which shows coordinate values only for those cells where the mask has True i.e. where the average speed was above 18

.T.stack().values

This then retrieves only the non-null values from the resultant dataframe and returns a numpy array

References I took :

  1. Get non-null elements in a pandas DataFrame --- To get only the non null values from a dataframe (.T.stack().values)

Comments

1

Let the first df be df1 and second df be df2

output_array = df2[df1>18].values.flatten() # df1>18 would create the mask
output_array = [val for val in output_array if type(val) == list] # removing the nan values. We can't use np.isnan as it would not work for list

Sample Input:

df1

enter image description here

df2

enter image description here

output_array

[[15.1, 20.5], [91.5, 95.8]]

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.