Avoid memory errors when integrating large array with Numpy

Question

I have a 601x350x200x146 numpy float64 array which, according to my calculations takes about 22.3 Gb of memory. My output of free -m tells me I have about 100Gb of free memory so it fits fine. However, when integrating with

result = np.trapz(large_arr, axis=3)

I get a memory error. I understand that this is because of the intermediate arrays that numpy.trapz has to create to perform the integration. But I'm looking to see if there's a way around it, or at least a way to minimize the extra use of memory.

I have read about memory errors and I know of things to avoid this: one is placing a gc.collect() call before the integration. I tried this and it didn't work.

The other one is using the *= operators such as writing arr*=a instead of arr=arr*a, which I can't really do here. So I don't know what else to try.

Does anyone know of a way to do this operation without raising a memory error?

You can reproduce the error with:

arr = np.ones((601,350,200,146), dtype=np.float64)
arr=np.trapz(arr, axis=3)

although you'll have to scale down the size to match your memory size.

It looks to me like that array is 49 gigabytes. NumPy makes a lot of use of intermediate scratch arrays; the np.trapz implementation needs more of those than your memory can hold. — user2357112
– user2357112, Commented Dec 16, 2016 at 21:14
Do you know what trapz is doing? Even if ``arr` itself fits, temporary arrays created by trapz might not. — hpaulj
– hpaulj, Commented Dec 16, 2016 at 21:15
@user2357112 Yeah, that's what I thought. But what you're saying is that there's just no way? — TomCho
– TomCho, Commented Dec 16, 2016 at 21:16
Numexpr, Numba, or Cython could help, although you'd need to rewrite the np.trapz implementation instead of calling np.trapz. There's only about one line of actual math in it, though, and you don't need most of the boilerplate, so most of the hassle would be in installing and reading up on a new tool. — user2357112
– user2357112, Commented Dec 16, 2016 at 21:21
I think you might be able to get the memory usage down sufficiently in pure NumPy by being careful about in-place operations, but you still wouldn't be able to use np.trapz for the work. — user2357112
– user2357112, Commented Dec 16, 2016 at 21:23

Warren Weckesser · Accepted Answer · 2016-12-16 22:03:32Z

numpy.trapz provides some convenience, but the actual calculation is very simple. To avoid large temporary arrays, just implement it yourself:

In [37]: x.shape
Out[37]: (2, 4, 4, 10)

Here's the result of numpy.trapz(x, axis=3):

In [38]: np.trapz(x, axis=3)
Out[38]: 
array([[[ 43. ,  48.5,  46.5,  67. ],
        [ 35.5,  39.5,  52.5,  35. ],
        [ 44.5,  47.5,  34.5,  39.5],
        [ 54. ,  40. ,  46.5,  50.5]],

       [[ 42. ,  60. ,  55.5,  51. ],
        [ 51.5,  40. ,  52. ,  42.5],
        [ 48.5,  43. ,  32. ,  36.5],
        [ 42.5,  38. ,  38. ,  45. ]]])

Here's the calculation written to use no large intermediate arrays. (The slice x[:,:,:,1:-1] does not copy the data associated with the array.)

In [48]: 0.5*(x[:,:,:,0] + 2*x[:,:,:,1:-1].sum(axis=3) + x[:,:,:,-1])
Out[48]: 
array([[[ 43. ,  48.5,  46.5,  67. ],
        [ 35.5,  39.5,  52.5,  35. ],
        [ 44.5,  47.5,  34.5,  39.5],
        [ 54. ,  40. ,  46.5,  50.5]],

       [[ 42. ,  60. ,  55.5,  51. ],
        [ 51.5,  40. ,  52. ,  42.5],
        [ 48.5,  43. ,  32. ,  36.5],
        [ 42.5,  38. ,  38. ,  45. ]]])

If x has shape (m, n, p, q), the few temporary arrays that are generated in that expression all have shape (m, n, p).

Ah, of course. We can reduce the dimensions immediately, instead of at the last step. You can eliminate the remaining temporary arrays by using += and *=, which could be a little helpful when reducing along a short axis, but the memory savings are small compared to the size of the input and the size of np.trapz's temporaries.

Collectives™ on Stack Overflow

Avoid memory errors when integrating large array with Numpy

1 Answer 1

1 Comment

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Related