How to remove overlapping blocks from numpy array?

Question

I'm using cv2.goodFeaturesToTrack function to find feature points in an image. The end goal is to extract square blocks of certain size, with feature points being the centers of those blocks.

However, lots of the feature points are close to each other, so the blocks are overlapping, which is not what I want.

This is an example of all feature points (centers):

array([[3536., 1419.],
       [2976., 1024.],
       [3504., 1400.],
       [3574., 1505.],
       [3672., 1453.],
       [3671., 1442.],
       [3489., 1429.],
       [3108.,  737.]])

Let's say I want to find the first n blocks with a blockRadius = 400 which are not overlapping. Any ideas on how to achieve this?

mathfux · Accepted Answer · 2020-11-10 15:43:55Z

You could get closer with scipy.spatial.KDTree - though it doesn't support querying blocks that consists of distinct amounts of points in blocks. So it can be used in conjunction with another library python-igraph that allows to find connected components of close points in a fast manner:

from scipy.spatial import KDTree
import igraph as ig

data = np.array([[3536., 1419.],
       [2976., 1024.],
       [3504., 1400.],
       [3574., 1505.],
       [3672., 1453.],
       [3671., 1442.],
       [3489., 1429.],
       [3108.,  737.]])
edges1 = KDTree(data[:,:1]).query_pairs(r=400)
edges2 = KDTree(data[:,1:]).query_pairs(r=400)
g = ig.Graph(n = len(data), edges=edges1 & edges2)
i = g.clusters()

So clusters corresponds to sequences of indices of block points of some kind of internal type igraph. There's a quick preview:

>>> print(i)
Clustering with 8 elements and 2 clusters
[0] 0, 2, 3, 4, 5, 6
[1] 1, 7
>>> pal = ig.drawing.colors.ClusterColoringPalette(len(i)) #number of colors used
color = pal.get_many(i.membership) #list of color tags
ig.plot(g,  bbox = (200, 100), layout=g.layout('circle'), vertex_label=g.vs.indices, 
        vertex_color = color, vertex_size = 12, vertex_label_size = 8)

Example of usage:

>>> [data[n] for n in i] #or list(i)
[array([[3536., 1419.],
        [3504., 1400.],
        [3574., 1505.],
        [3672., 1453.],
        [3671., 1442.],
        [3489., 1429.]]),
 array([[2976., 1024.],
        [3108.,  737.]])]

Remark: this method allows to work with pairs of close points instead of n*n matrix which is more efficient in memory in some cases.

Could have sworn scipy used to have a ball_tree algorithm that could deal with arbitrary metrics at some point. Only downside of this method is you're stuck with minkowski p-norms, so no chebyshev.

Daniel F · Accepted Answer · 2020-11-12 10:40:58Z

You'll need something iterative to do that, as recurrent dropouts like this aren't vectorizable. Something like this will work, I think

from scipy.spatial.distance import pdist, squareform

c = np.array([[3536., 1419.],
       [2976., 1024.],
       [3504., 1400.],
       [3574., 1505.],
       [3672., 1453.],
       [3671., 1442.],
       [3489., 1429.],
       [3108.,  737.]])

dists = squareform(pdist(c, metric = 'chebyshev'))     # distance matrix, chebyshev here since you seem to want blocks
indices = np.arange(c.shape[0])  # indices that haven't been dropped (all to start)
out = [0]                        # always want the first index
while True:
    try:
        indices = indices[dists[indices[0], indices] > 400] #drop indices that are inside threshhold
        out.append(indices[0])   # add the next index that hasn't been dropped to the output
    except:
        break   # once you run out of indices, you'll get an IndexError and you're done
print(out)
[0, 1]

let's try with a whole bunch of points:

np.random.seed(42)
c = np.random.rand(10000, 2) * 800
dists = squareform(pdist(c, metric = 'chebyshev'))     # distance matrix, checbyshev here since you seem to want squares
indices = np.arange(c.shape[0])  # indices that haven't been dropped (all to start)
out = [0]                        # always want the first index
while True:
    try:
        indices = indices[dists[indices[0], indices] > 400] #drop indices that are inside threshhold
        out.append(indices[0])   # add the next index that hasn't been dropped to the output
    except:
        break   # once you run out of indices, you'll get an IndexError and you're done
print(out, pdist(c[out], metric = 'chebyshev'))
[0, 2, 6, 17] [635.77582886 590.70015659 472.87353138 541.13920029 647.69071411
 476.84658995]

So, 4 points (makes sense since 4 400x400 blocks tile a 800x800 space with 4 tiles), mostly low values (17 << 10000) and distance between kept points is always > 400

Just want to say that if you have a lot of points and my method is starting to bog down /run into memory errors, @mathfux 's method should be significantly faster (although a lot harder to understand without a degree in graph theory).
I will probably only need max. 8 final centers, so hopefully won't get into those problems.

Collectives™ on Stack Overflow

How to remove overlapping blocks from numpy array?

2 Answers 2

1 Comment

2 Comments

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

2 Comments

Related