4

This answer explains how to find the nearest (sorted) array element to a single point, in a manner efficient for large arrays (slightly modified):

def arg_nearest(array, value):
    idx = np.searchsorted(array, value, side="left")
    if idx > 0 and (idx == len(array) or math.fabs(value - array[idx-1]) < math.fabs(value - array[idx])):
        return idx-1
    else:
        return idx

If, instead, we want to find the array elements nearest a set of points (i.e. a second array); are there efficient (by speed, for large arrays) ways of extending this besides using a for-loop?

Some test cases:

>>> xx = [0.2, 0.8, 1.3, 1.5, 2.0, 3.1, 3.8, 3.9, 4.5, 5.1, 5.5]
>>> yy = [1, 2, 3, 4, 5]
>>> of_x_nearest_y(xx, yy)
[0.5, 2.0, 3.1, 3.9, 5.1]

>>> xx = [0.2, 0.8, 1.3, 1.5, 2.0, 3.1, 3.8, 3.9, 4.5, 5.1, 5.5]
>>> yy = [-2, -1, 4.6, 5.8]
>>> of_x_nearest_y(xx, yy)
[0.2, 0.2, 4.5, 5.5]

Edit: assuming both arrays are sorted, you can do a little better than a completely naive for-loop by excluding values below those already matched, i.e.

def args_nearest(options, targets):
    locs = np.zeros(targets.size, dtype=int)
    prev = 0
    for ii, tt in enumerate(targets):
        locs[ii] = prev + arg_nearest(options[prev:], tt)
        prev = locs[ii]
    return locs
2
  • searchsorted takes an array of values to search for, so it isn't too difficult to modify arg_nearest for your job. Commented Jun 15, 2016 at 17:16
  • @user2357112 hmmm, good point! Commented Jun 15, 2016 at 17:17

1 Answer 1

3

You can make few changes to extend it for an array of elements in value, like so -

idx = np.searchsorted(xx, yy, side="left").clip(max=xx.size-1)
mask = (idx > 0) &  \
       ( (idx == len(xx)) | (np.fabs(yy - xx[idx-1]) < np.fabs(yy - xx[idx])) )
out = xx[idx-mask]

Explanation

Nomenclature : array is the array in which we are looking to place elements from value to maintain the sorted nature of array.

Changes needed to extend the solution for a single element to many elements for searching :

1] Clip the indices array idx obtained from np.searchsorted at a max. of array.size-1, because for elements in value that are larger than the maximum of array, we need to make idx indexable by array.

2] Introduce numpy to replace math to do those operations in a vectorized manner.

3] Replace the conditional statement by the trick of idx - mask. In this case, internally Python would up-convert mask to an int array to match up with the datatype of idx. Thus, all the True elements become 1 and thus for True elements we would effectively have idx-1, which is the True case of the IF conditional statement in the original code.

Sign up to request clarification or add additional context in comments.

3 Comments

Beautiful! I just came up with (effectively) the same solution, except involving 8 lines with numerous filters and a couple ~ inversions... you win!
@DilithiumMatrix Interesting problem really and should be useful in solving many other nearest-based problems! Before this, I would have gone with a brute-force broadcasting based solution : xx[np.abs(xx[:,None] - yy).argmin(0)]. But this searchsorted based solution should scale up pretty nicely for large arrays. Thank you for introducing us this efficient idea!
Haha, always glad for my struggles to be useful!

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.