4

I have a 2d-array (a) for lookup and an array (v) to find indices where elements should be inserted:

import numpy as np

# [EDIT] Add more records which contain NaNs
a = np.array(
[[0., 923.9943, 996.8978, 1063.9064, 1125.639, 1184.3985, 1259.9854, 1339.6107, 1503.4462, 2035.6527],
 [0., 1593.6196, 1885.2442, 2152.956, 2419.0038, 2843.517, 3551.225, 5423.009, 18930.8694, 70472.4002],
 [0., 1593.6196, 1885.2442, 2152.956, 2419.0038, 2843.517, 3551.225, 5423.009, 18930.8694, 70472.4002],
 [0., 1084.8388, 1132.6918, 1172.2278, 1215.7986, 1259.062, 1334.4778, 1430.738, 1650.4502, 3966.1578],
 [0., 1084.8388, 1132.6918, 1172.2278, 1215.7986, 1259.062, 1334.4778, 1430.738, 1650.4502, 3966.1578],
 [np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan],
 [0., 923.9943, 996.8978, 1063.9064, 1125.639, 1184.3985, 1259.9854, 1339.6107, 1503.4462, 2035.6527],
 [0., 1593.6196, 1885.2442, 2152.956, 2419.0038, 2843.517, 3551.225, 5423.009, 18930.8694, 70472.4002],
 [0., 1593.6196, 1885.2442, 2152.956, 2419.0038, 2843.517, 3551.225, 5423.009, 18930.8694, 70472.4002],
 [0., 1084.8388, 1132.6918, 1172.2278, 1215.7986, 1259.062, 1334.4778, 1430.738, 1650.4502, 3966.1578],
 [0., 1084.8388, 1132.6918, 1172.2278, 1215.7986, 1259.062, 1334.4778, 1430.738, 1650.4502, 3966.1578]])

v = np.array([641.954, 56554.498, 168078.307, 1331.692, 2233.327, 1120.03, 641.954, 56554.498, 168078.307, 1331.692, 2233.327])

This is the result I want to get:

[1, 9, 10, 6, 9, 0, 1, 9, 10, 6, 9]

Obviously, with a for loop I can index the array a and v like this:

for i, _ in enumerate(a):
    print(np.searchsorted(a[i], v[i]))

Are there any vectorized ways to do this which are more efficient?

3
  • So, it could have entire rows as all NaNs? Could it also have few elements and not all elements in a row as NaNs? Commented Jun 7, 2019 at 8:54
  • @Divakar No possibilities of few NaNs in a row. If it has NaNs, entire rows are NaNs. Commented Jun 11, 2019 at 4:44
  • 1
    Consider using Jax's vmap Commented Dec 17, 2021 at 15:41

1 Answer 1

6

Inspired by Vectorized searchsorted numpy for the underlying idea, here's one between 2D and 1D arrays -

def searchsorted2d(a,b):
    # Inputs : a is (m,n) 2D array and b is (m,) 1D array.
    # Finds np.searchsorted(a[i], b[i])) in a vectorized way by
    # scaling/offsetting both inputs and then using searchsorted

    # Get scaling offset and then scale inputs
    s = np.r_[0,(np.maximum(a.max(1)-a.min(1)+1,b)+1).cumsum()[:-1]]
    a_scaled = (a+s[:,None]).ravel()
    b_scaled = b+s

    # Use searchsorted on scaled ones and then subtract offsets
    return np.searchsorted(a_scaled,b_scaled)-np.arange(len(s))*a.shape[1]

Output on given sample -

In [101]: searchsorted2d(a,v)
Out[101]: array([ 1,  9, 10,  6,  9])

Case with all NaNs rows

To extend to make it work for all NaNs rows, we need few more steps -

valid_mask = ~np.isnan(a).any(1)
out = np.zeros(len(a), dtype=int)
out[valid_mask] = searchsorted2d(a[valid_mask],v[valid_mask])
Sign up to request clarification or add additional context in comments.

3 Comments

Thanks a lot. It's a great idea and worked well with the sample, but with the real data which contains NaNs, some issues came up. Could you give me an idea to handle this?
@kokon Please post sample data, the expected output and hopefully a working solution with loops even.
Since there's a limited number of characters in this comment, I updated the sample in the original post above to include NaN records. Currently with your solution, the data after NaN records returns negative numbers, which are different from the expected output: array([ 1, 9, 10, 6, 9, 0, -10, -20, -30, -40, -50])

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.