6

today I used the numpy array for some calculation and found a strange problem, for example, assume i already imported numpy.arange in Ipython, and I run some scripts as follows:

In [5]: foo = arange(10)                                                      

In [8]: foo1 = foo[arange(3)]                                                 

In [11]: foo1[:] = 0                                                          

In [12]: foo
Out[12]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [16]: foo2 = foo[0:3]                                                      

In [19]: foo2[:]=0                                                            

In [21]: foo
Out[21]: array([0, 0, 0, 3, 4, 5, 6, 7, 8, 9])

above shows that when i slice the array by foo[arange(3)], i got a copy of the array slice, but when i slice the array by foo[0:3], i got a reference of the array slice, thus foo changes with foo2. Then I thought foo and foo2 should have the same id, but that seems is not true

In [59]: id(foo)
Out[59]: 27502608

In [60]: id(foo2)
Out[60]: 28866880

In [61]: id(foo[0])
Out[61]: 38796768

In [62]: id(foo2[0])
Out[62]: 38813248

...

even more strange, if I keep checking the id of foo and foo2, they are not constant, and sometimes, they did match each other!

In [65]: id(foo2[0])
Out[65]: 38928592

In [66]: id(foo[0])                                                          
Out[66]: 37111504

In [67]: id(foo[0])
Out[67]: 38928592

can anyone explain this a little bit? I am really confused by this dynamic feature of python

thanks alot

1 Answer 1

5
foo[arange(3)]

is not a slice. The elements of arange(3) are used to select elements of foo to construct a new array. Since this can't efficiently return a view (every element of the view would have to be an independent reference, and operations on the view would require following far too many pointers), it returns a new array.

foo[0:3]

is a slice. This can be done efficiently as a view; it only requires adjusting some bounds. Thus, it returns a view.

id(foo[0])

foo[0] doesn't refer to a specific Python object. Keeping separate Python objects for every array element would be far too expensive, negating much of the benefit of numpy. Instead, when an indexing operation is performed on a numpy ndarray, numpy constructs a new object to return. You'll get a different object with a different ID every time.

Sign up to request clarification or add additional context in comments.

5 Comments

well, then why id(foo) is also different from id(foo2)? do they use the first element 's address as their address?
@shelper: foo isn't foo2. Although they have the same shape, dtype, etc., and although they use the same storage for their elements, they are different objects. I don't think the ID you receive has any relation to the addresses of the array elements; it's the address of a header containing array metadata and a pointer to the storage used for the elements.
well, i think i understand the issue, foo and foo2 are both well wrapped python object, id(foo) just show the address of the python object, not the memory that contains the data, which actually can be get by " foo.__array_interface__['data'] "
You can check if two arrays share the same base memory by comparing their .base attribute, i.e. in your case foo.base is foo2.base should evaluate to True.
@Jamie - If I recall correctly, there can be cases where foo.base is not the same as foo2.base even though they share the same memory buffer. It holds true for all slicing operations, but not everywhere. Things that create a view through __array_interface__ (for example np.lib.stride_tricks.as_strided) won't necessarily show the same base. As I understand it, numpy.may_share_memory(foo, foo2) is preferred way to check if two arrays share the same memory (though I could be wrong there).

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.