3

I have many large multidimensional NP arrays (2D and 3D) used in an algorithm. There are numerous iterations in this, and during each iteration the arrays are recalculated by performing calculations and saving into temporary arrays of the same size. At the end of a single iteration the contents of the temporary arrays are copied into the actual data arrays.

Example:

global A, B # ndarrays
A_temp = numpy.zeros(A.shape)
B_temp = numpy.zeros(B.shape)
for i in xrange(num_iters):
    # Calculate new values from A and B storing in A_temp and B_temp...
    # Then copy values from temps to A and B
    A[:] = A_temp
    B[:] = B_temp

This works fine, however it seems a bit wasteful to copy all those values when A and B could just swap. The following would swap the arrays:

A, A_temp = A_temp, A
B, B_temp = B_temp, B

However there can be other references to the arrays in other scopes which this won't change.

It seems like NumPy could have an internal method for swapping the internal data pointer of two arrays, such as numpy.swap(A, A_temp). Then all variables pointing to A would be pointing to the changed data.

5
  • Can you give an example of "other references to the arrays in other scopes which this won't change"? Commented Mar 31, 2012 at 9:49
  • For example, for the calculation step I could (and will) have multiple threads, called with a function where the fields are passed as arguments. To reduce the overhead of instantiating multiple threads they cycle as well. Commented Apr 1, 2012 at 3:57
  • if I recall correctly, you can't pass the data between threads/processes without copy unless you're using specific features like sharedctypes arrays or so. But I might be wrong... Commented Apr 1, 2012 at 22:04
  • Sharing between threads is not difficult, in fact does not require any extra work. It just requires you to be extra careful about race conditions (and since I am only reading in the loop and only one thread is writing the arrays this is not really an issue). For multiprocessing some additional work is required, but the Scipy site says "It is possible to share memory between processes, including numpy arrays." (scipy.org/ParallelProgramming). Commented Apr 4, 2012 at 4:05
  • indeed, you need to use sharedctypes arrays for that (I only worked with multiprocessing so I ain't sure about threading of course) Commented Apr 6, 2012 at 13:17

3 Answers 3

2

Even though you way should work as good (I suspect the problem is somewhere else), you can try doing it explicitly:

import numpy as np
A, A_temp = np.frombuffer(A_temp), np.frombuffer(A)

It's not hard to verify that your method works as well:

>>> import numpy as np
>>> arr = np.zeros(100)
>>> arr2 = np.ones(100)
>>> print arr.__array_interface__['data'][0], arr2.__array_interface__['data'][0]
152523144 152228040

>>> arr, arr2 = arr2, arr
>>> print arr.__array_interface__['data'][0], arr2.__array_interface__['data'][0]
152228040 152523144

... pointers succsessfully switched

Sign up to request clarification or add additional context in comments.

1 Comment

The frombuffer idea is interesting, however it does require a "reshape(A.shape)" command since it only returns 1D arrays. Besides that though this does not solve the issue with alternate references however. I tried a few different things with it, and what is needed a "setbuffer" function... This has given me the idea to look into views though (which I don't really know too much about).
1

Perhaps you could solve this by adding a level of indirection.

You could have an "array holder" class. All that would do is keep a reference to the underlying NumPy array. Implementing a cheap swap operation for a pair of these would be trivial.

If all external references are to these holder objects and not directly to the arrays, none of those references would get invalidated by a swap.

3 Comments

This is definitely a viable alternative. It just seems silly since the ndarray already has a pointer to the proper data that could be changed... if I cannot figure out a way using views this will be the answer.
@thaimin: The Pythonic way to swap two things is a, b = b, a. This swaps the references. I can't think of a single standard class that implements a swap interface like the one you're looking for. It's just not done in Python since it's almost never needed.
I understand. I attempted to use views which would work if it let you re-assign the "base" field. I attempted to assign the "data" field which almost worked but doesn't allow swapping (although the first assignment does exactly what I want and a.data, b.data = b.data, a.data does swap "one" of them).
1

I realize this is an old question, but for what it's worth you could also swap data between two ndarray buffers (without a temp copy) by performing an xor swap:

A_bytes = A.view('ubyte')
A_temp_bytes = A.view('ubyte')
A_bytes ^= A_temp_bytes
A_temp_bytes ^= A_bytes
A_bytes ^= A_temp_bytes

Since this was done on views, if you look at the original A and A_temp arrays (in whatever their original dtype was) their values should be correctly swapped. This is basically equivalent to the numpy.swap(A, A_temp) you were looking for. It's unfortunate that it requires 3 loops--if this were implemented as a ufunc (maybe it should be) it would be a lot faster.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.