Mapping functions of 2D numpy arrays

Question

I have a function foo that takes a NxM numpy array as an argument and returns a scalar value. I have a AxNxM numpy array data, over which I'd like to map foo to give me a resultant numpy array of length A.

Curently, I'm doing this:

result = numpy.array([foo(x) for x in data])

It works, but it seems like I'm not taking advantage of the numpy magic (and speed). Is there a better way?

I've looked at numpy.vectorize, and numpy.apply_along_axis, but neither works for a function of 2D arrays.

EDIT: I'm doing boosted regression on 24x24 image patches, so my AxNxM is something like 1000x24x24. What I called foo above applies a Haar-like feature to a patch (so, not terribly computationally intensive).

There might be a way to recode foo so that it can accept a numpy array of arbitrary dimension, applying its computations to the last two axes. But we'd have to see how foo is coded to make specific suggestions. — unutbu
– unutbu, Commented May 5, 2010 at 11:58
I've added more detail about my specific problem. Would it make sense to leave data as is, re-code foo to take an index parameter, and then vectorize it and map it over an arange(len(x))? — perimosocordiae
– perimosocordiae, Commented May 5, 2010 at 19:57

wisty · Accepted Answer · 2010-05-05 11:40:31Z

If NxM is big (say, 100), they the cost of iterating over A will be amortized into basically nothing.

Say the array is 1000 X 100 X 100.

Iterating is O(1000), but the cumulative cost of the inside function is O(1000 X 100 X 100) - 10,000 times slower. (Note, my terminology is a bit wonky, but I do know what I'm talking about)

I'm not sure, but you could try this:

result = numpy.empty(data.shape[0])
for i in range(len(data)):
    result[i] = foo(data[i])

You would save a big of memory allocation on building the list ... but the loop overhead would be greater.

Or you could write a parallel version of the loop, and split it across multiple processes. That could be a lot faster, depending on how intensive foo is (as it would have to offset the data handling).

Variation: result = np.fromiter(itertools.imap(f, data), dtype=data.dtype, count=data.shape[0])

sastanin · Accepted Answer · 2010-05-05 12:41:40Z

You can achieve that by reshaping your 3D array as a 2D array with the same leading dimension, and wrap your function foo with a function that works on 1D arrays by reshaping them as required by foo. An example (using trace instead of foo):

from numpy import *

def apply2d_along_first(func2d, arr3d):
    a, n, m = arr3d.shape
    def func1d(arr1d):
        return func2d(arr1d.reshape((n,m)))
    arr2d = arr3d.reshape((a,n*m))
    return apply_along_axis(func1d, -1, arr2d)

A, N, M = 3, 4, 5
data = arange(A*N*M).reshape((A,N,M))

print data
print apply2d_along_first(trace, data)

Output:

[[[ 0  1  2  3  4]
  [ 5  6  7  8  9]
  [10 11 12 13 14]
  [15 16 17 18 19]]

 [[20 21 22 23 24]
  [25 26 27 28 29]
  [30 31 32 33 34]
  [35 36 37 38 39]]

 [[40 41 42 43 44]
  [45 46 47 48 49]
  [50 51 52 53 54]
  [55 56 57 58 59]]]
[ 36 116 196]

np.fromiter(imap( variant is 3-5 times faster than apply2d_..()

Collectives™ on Stack Overflow

Mapping functions of 2D numpy arrays

2 Answers 2

1 Comment

1 Comment

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

1 Comment

Related