6

I have a function foo that takes a NxM numpy array as an argument and returns a scalar value. I have a AxNxM numpy array data, over which I'd like to map foo to give me a resultant numpy array of length A.

Curently, I'm doing this:

result = numpy.array([foo(x) for x in data])

It works, but it seems like I'm not taking advantage of the numpy magic (and speed). Is there a better way?

I've looked at numpy.vectorize, and numpy.apply_along_axis, but neither works for a function of 2D arrays.

EDIT: I'm doing boosted regression on 24x24 image patches, so my AxNxM is something like 1000x24x24. What I called foo above applies a Haar-like feature to a patch (so, not terribly computationally intensive).

2
  • 1
    There might be a way to recode foo so that it can accept a numpy array of arbitrary dimension, applying its computations to the last two axes. But we'd have to see how foo is coded to make specific suggestions. Commented May 5, 2010 at 11:58
  • I've added more detail about my specific problem. Would it make sense to leave data as is, re-code foo to take an index parameter, and then vectorize it and map it over an arange(len(x))? Commented May 5, 2010 at 19:57

2 Answers 2

2

If NxM is big (say, 100), they the cost of iterating over A will be amortized into basically nothing.

Say the array is 1000 X 100 X 100.

Iterating is O(1000), but the cumulative cost of the inside function is O(1000 X 100 X 100) - 10,000 times slower. (Note, my terminology is a bit wonky, but I do know what I'm talking about)

I'm not sure, but you could try this:

result = numpy.empty(data.shape[0])
for i in range(len(data)):
    result[i] = foo(data[i])

You would save a big of memory allocation on building the list ... but the loop overhead would be greater.

Or you could write a parallel version of the loop, and split it across multiple processes. That could be a lot faster, depending on how intensive foo is (as it would have to offset the data handling).

Sign up to request clarification or add additional context in comments.

1 Comment

Variation: result = np.fromiter(itertools.imap(f, data), dtype=data.dtype, count=data.shape[0])
1

You can achieve that by reshaping your 3D array as a 2D array with the same leading dimension, and wrap your function foo with a function that works on 1D arrays by reshaping them as required by foo. An example (using trace instead of foo):

from numpy import *

def apply2d_along_first(func2d, arr3d):
    a, n, m = arr3d.shape
    def func1d(arr1d):
        return func2d(arr1d.reshape((n,m)))
    arr2d = arr3d.reshape((a,n*m))
    return apply_along_axis(func1d, -1, arr2d)

A, N, M = 3, 4, 5
data = arange(A*N*M).reshape((A,N,M))

print data
print apply2d_along_first(trace, data)

Output:

[[[ 0  1  2  3  4]
  [ 5  6  7  8  9]
  [10 11 12 13 14]
  [15 16 17 18 19]]

 [[20 21 22 23 24]
  [25 26 27 28 29]
  [30 31 32 33 34]
  [35 36 37 38 39]]

 [[40 41 42 43 44]
  [45 46 47 48 49]
  [50 51 52 53 54]
  [55 56 57 58 59]]]
[ 36 116 196]

1 Comment

np.fromiter(imap( variant is 3-5 times faster than apply2d_..()

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.