I've got a numpy array containing labels. I'd like to get calculate a number for each label based on its size and bounding box. How can I write this more efficiently so that it's realistic to use on large arrays (~15000 labels)?
A = array([[ 1, 1, 0, 3, 3],
[ 1, 1, 0, 0, 0],
[ 1, 0, 0, 2, 2],
[ 1, 0, 2, 2, 2]] )
B = zeros( 4 )
for label in range(1, 4):
# get the bounding box of the label
label_points = argwhere( A == label )
(y0, x0), (y1, x1) = label_points.min(0), label_points.max(0) + 1
# assume I've computed the size of each label in a numpy array size_A
B[ label ] = myfunc(y0, x0, y1, x1, size_A[label])
Ain the real use case?myfuncwhich could probably be parallized by saving y0, x0, y1, x1 in separate arrays getting out of the loop and only calling the function once. Otherwise, if speed really counts, you may want to look into whether it's worth doing some C code. I found cython to be really comfortable when working with numpy arrays.argwherecall for every label.