How to vectorize with numpy.searchsorted in a 2d-array

Question

I have a 2d-array (a) for lookup and an array (v) to find indices where elements should be inserted:

import numpy as np

# [EDIT] Add more records which contain NaNs
a = np.array(
[[0., 923.9943, 996.8978, 1063.9064, 1125.639, 1184.3985, 1259.9854, 1339.6107, 1503.4462, 2035.6527],
 [0., 1593.6196, 1885.2442, 2152.956, 2419.0038, 2843.517, 3551.225, 5423.009, 18930.8694, 70472.4002],
 [0., 1593.6196, 1885.2442, 2152.956, 2419.0038, 2843.517, 3551.225, 5423.009, 18930.8694, 70472.4002],
 [0., 1084.8388, 1132.6918, 1172.2278, 1215.7986, 1259.062, 1334.4778, 1430.738, 1650.4502, 3966.1578],
 [0., 1084.8388, 1132.6918, 1172.2278, 1215.7986, 1259.062, 1334.4778, 1430.738, 1650.4502, 3966.1578],
 [np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan],
 [0., 923.9943, 996.8978, 1063.9064, 1125.639, 1184.3985, 1259.9854, 1339.6107, 1503.4462, 2035.6527],
 [0., 1593.6196, 1885.2442, 2152.956, 2419.0038, 2843.517, 3551.225, 5423.009, 18930.8694, 70472.4002],
 [0., 1593.6196, 1885.2442, 2152.956, 2419.0038, 2843.517, 3551.225, 5423.009, 18930.8694, 70472.4002],
 [0., 1084.8388, 1132.6918, 1172.2278, 1215.7986, 1259.062, 1334.4778, 1430.738, 1650.4502, 3966.1578],
 [0., 1084.8388, 1132.6918, 1172.2278, 1215.7986, 1259.062, 1334.4778, 1430.738, 1650.4502, 3966.1578]])

v = np.array([641.954, 56554.498, 168078.307, 1331.692, 2233.327, 1120.03, 641.954, 56554.498, 168078.307, 1331.692, 2233.327])

This is the result I want to get:

[1, 9, 10, 6, 9, 0, 1, 9, 10, 6, 9]

Obviously, with a for loop I can index the array a and v like this:

for i, _ in enumerate(a):
    print(np.searchsorted(a[i], v[i]))

Are there any vectorized ways to do this which are more efficient?

So, it could have entire rows as all NaNs? Could it also have few elements and not all elements in a row as NaNs? — Divakar
– Divakar, Commented Jun 7, 2019 at 8:54
@Divakar No possibilities of few NaNs in a row. If it has NaNs, entire rows are NaNs. — kokon
– kokon, Commented Jun 11, 2019 at 4:44

Divakar · Accepted Answer · 2019-06-11 06:18:04Z

6

Inspired by Vectorized searchsorted numpy for the underlying idea, here's one between 2D and 1D arrays -

def searchsorted2d(a,b):
    # Inputs : a is (m,n) 2D array and b is (m,) 1D array.
    # Finds np.searchsorted(a[i], b[i])) in a vectorized way by
    # scaling/offsetting both inputs and then using searchsorted

    # Get scaling offset and then scale inputs
    s = np.r_[0,(np.maximum(a.max(1)-a.min(1)+1,b)+1).cumsum()[:-1]]
    a_scaled = (a+s[:,None]).ravel()
    b_scaled = b+s

    # Use searchsorted on scaled ones and then subtract offsets
    return np.searchsorted(a_scaled,b_scaled)-np.arange(len(s))*a.shape[1]

Output on given sample -

In [101]: searchsorted2d(a,v)
Out[101]: array([ 1,  9, 10,  6,  9])

Case with all NaNs rows

To extend to make it work for all NaNs rows, we need few more steps -

valid_mask = ~np.isnan(a).any(1)
out = np.zeros(len(a), dtype=int)
out[valid_mask] = searchsorted2d(a[valid_mask],v[valid_mask])

edited Jun 11, 2019 at 6:18

answered Jun 6, 2019 at 6:46

Divakar

222k19 gold badges273 silver badges374 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

kokon Over a year ago

Thanks a lot. It's a great idea and worked well with the sample, but with the real data which contains NaNs, some issues came up. Could you give me an idea to handle this?

Divakar Over a year ago

@kokon Please post sample data, the expected output and hopefully a working solution with loops even.

kokon Over a year ago

Since there's a limited number of characters in this comment, I updated the sample in the original post above to include NaN records. Currently with your solution, the data after NaN records returns negative numbers, which are different from the expected output: array([ 1, 9, 10, 6, 9, 0, -10, -20, -30, -40, -50])

Collectives™ on Stack Overflow

How to vectorize with numpy.searchsorted in a 2d-array

1 Answer 1

3 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Linked

Related