Fastest way to perform calculations on every NXN sub-array in 2D numpy array

Question

I have a 2D numpy array which represents a grayscale image. I need to extract every N x N sub-array within that array, with a specified overlap between sub-arrays, and calculate a property such as the mean, standard deviation, or median.

The code below performs this task but is quite slow because it uses Python for loops. Any ideas on how to vectorize this calculation or otherwise speed it up?

import numpy as np

img = np.random.randn(100, 100)
N = 4
step = 2

h, w = img.shape
out = []
for i in range(0, h - N, step):
    outr = []
    for j in range(0, w - N, step):
        outr.append(np.mean(img[i:i+N, j:j+N]))
    out.append(outr)
out = np.array(out)

Paul Panzer · Accepted Answer · 2018-12-19 17:52:37Z

For mean and standard deviation, there is a fast cumsum based solution.

Here are timings for a 500x200 image, 30x20 window and step sizes 5 and 3. For comparison I use skimage.util.view_as_windows with numpy mean and std.

mn + sd using cumsum     1.1531693299184553 ms
mn using view_as_windows 3.495307120028883 ms
sd using view_as_windows 21.855629019846674 ms

Code:

import numpy as np
from math import gcd
from timeit import timeit

def wsum2d(A, winsz, stepsz, canoverwriteA=False):
    M, N = A.shape
    m, n = winsz
    i, j = stepsz
    for X, x, s in ((M, m, i), (N, n, j)):
        g = gcd(x, s)
        if g > 1:
            X //= g
            x //= g
            s //= g
            A = A[:X*g].reshape(X, g, -1).sum(axis=1)
        elif not canoverwriteA:
            A = A.copy()
        canoverwriteA = True
        A[x:] -= A[:-x]
        A = A.cumsum(axis=0)[x-1::s]
        A = A.T
    return A

def w2dmnsd(A, winsz, stepsz):
    # combine A and A*A into a complex, so overheads apply only once
    M21 = wsum2d(A*(A+1j), winsz, stepsz, True)
    M2, mean_ = M21.real / np.prod(winsz), M21.imag / np.prod(winsz)
    sd = np.sqrt(M2 - mean_*mean_)
    return mean_, sd

# test
np.random.seed(0)
A = np.random.random((500, 200))
wsz = (30, 20)
stpsz = (5, 3)
mn, sd = w2dmnsd(A, wsz, stpsz)
from skimage.util import view_as_windows
Av = view_as_windows(A, wsz, stpsz) # this emits a warning on my system
assert np.allclose(mn, np.mean(Av, axis=(2, 3)))
assert np.allclose(sd, np.std(Av, axis=(2, 3)))
from timeit import repeat

print('mn + sd using cumsum    ', min(repeat(lambda: w2dmnsd(A, wsz, stpsz), number=100))*10, 'ms')
print('mn using view_as_windows', min(repeat(lambda: np.mean(Av, axis=(2, 3)), number=100))*10, 'ms')
print('sd using view_as_windows', min(repeat(lambda: np.std(Av, axis=(2, 3)), number=100))*10, 'ms')

max9111 · Accepted Answer · 2018-12-19 17:23:30Z

If Numba is an option the only thing to do is to avoid the list appends (It does work with list appends too, but slower. To make use of parallization too, rewrote the implementation a bit to avoid the step within range, which is not supported when using parfor.

Example

@nb.njit(error_model='numpy',parallel=True)
def calc_p(img,N,step):
  h,w=img.shape

  i_w=(h - N)//step
  j_w=(w - N)//step
  out = np.empty((i_w,j_w))
  for i in nb.prange(0, i_w):
      for j in range(0, j_w):
          out[i,j]=np.std(img[i*step:i*step+N, j*step:j*step+N])
  return out

def calc_n(img,N,step):
  h, w = img.shape
  out = []
  for i in range(0, h - N, step):
      outr = []
      for j in range(0, w - N, step):
          outr.append(np.std(img[i:i+N, j:j+N]))
      out.append(outr)
  return(np.array(out))

Timings

All timings are without compilation overhead of about 0.5s (the first call to the function is excluded from the timings).

#Data
img = np.random.randn(100, 100)
N = 4
step = 2

calc_n :17ms
calc_p :0.033ms

Because this is actually a rolling mean there is further room for improvement if N gets larger.

I thought all of the answers were useful, but I ended up implementing this one for my own project due to the dramatic enhancement in speed. Thanks!

Parth Verma · Accepted Answer · 2018-12-19 15:24:34Z

1

You could use scikit-image block_reduce:

So your code becomes:

import numpy as np
import skimage.measure

N = 4

# Your main array
a = np.arange(9).reshape(3,3)

mean = skimage.measure.block_reduce(a, (N,N), np.mean) 
std_dev = skimage.measure.block_reduce(a, (N,N), np.std)
median = skimage.measure.block_reduce(a, (N,N), np.median)

However, the above code only works for strides/steps of size 1.

For mean, you could use mean pooling which is available in any modern day ML package. As for median and standard deviation, this seems the right approach.

answered Dec 19, 2018 at 15:24

Parth Verma

8301 gold badge8 silver badges16 bronze badges

1 Comment

Paul Brodersen Over a year ago

This solution is incorrect as it results in array that is downsampled by a factor of N, not in an array that is downsampled by a factor of step.

Paul Brodersen · Accepted Answer · 2018-12-19 17:18:34Z

The general case can be solved using scipy.ndimage.generic_filter:

import numpy as np

from scipy.ndimage import generic_filter

img = np.random.randn(100, 100)

N = 4
filtered = generic_filter(img.astype(np.float), np.std, size=N)

step = 2
output = filtered[::step, ::step]

However, this may actually run not much faster than a simple for loop.

To apply a mean and median filter you can use skimage.rank.mean and skimage.rank.median, respectively, which should be faster. There is also scipy.ndimage.median_filter. Otherwise, the mean can also be effectively computed through simple convolution with an (N, N) array with values 1./N^2. For the standard deviation you probably have to bite the bullet and use generic_filter unless your step size is larger or equal to N.

Collectives™ on Stack Overflow

Fastest way to perform calculations on every NXN sub-array in 2D numpy array

4 Answers 4

Comments

1 Comment

1 Comment

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

1 Comment

1 Comment

Comments

Linked

Related