7

I have a question about Numpy array memory management. Suppose I create a numpy array from a buffer using the following:

>>> s = "abcd"
>>> arr = numpy.frombuffer(buffer(s), dtype = numpy.uint8)
>>> arr.flags
  C_CONTIGUOUS : True
  F_CONTIGUOUS : True
  OWNDATA : False
  WRITEABLE : False
  ALIGNED : True
  UPDATEIFCOPY : False
>>> del s # What happens to arr?

In the situation above, does 'arr' hold a reference to 's'? If I delete 's', will this free the memory allocated for 's' and thus make 'arr' potentially referencing unallocated memory?

Some other questions I have:

  • If this is valid, how does Python know when to free the memory allocated by 's'? The gc.get_referrents(arr) function doesn't seem to show that 'arr' holds a reference to 's'.
  • If this is invalid, how can I register a reference to 's' into 'arr' so that Python GC will automatically reap 's' when all references to it are gone?

2 Answers 2

7

The following should clarify things a little:

>>> s = 'abcd'
>>> arr = np.frombuffer(buffer(s), dtype='uint8')
>>> arr.base
<read-only buffer for 0x03D1BA60, size -1, offset 0 at 0x03D1BA00>
>>> del s
>>> arr.base
<read-only buffer for 0x03D1BA60, size -1, offset 0 at 0x03D1BA00>

In the first case del s has no effect, because what the array is pointing to is a buffer created from it, which is not referenced anywhere else.

>>> t = buffer('abcd')
>>> arr = np.frombuffer(t, dtype='uint8')
>>> arr.base
<read-only buffer for 0x03D1BA60, size -1, offset 0 at 0x03C8D920>
>>> arr.base is t
True
>>> del t
>>> arr.base
<read-only buffer for 0x03D1BA60, size -1, offset 0 at 0x03C8D920>

In the second case, when you del t, you get rid of the variable t pointing to the buffer object, but because the array still has a reference to that same buffer, it is not deleted. While I am not sure how to check it, if you now del arr, the buffer object should lose its last reference and be automatically garbage-collected.

Sign up to request clarification or add additional context in comments.

1 Comment

you can use sys.getrefcount in CPython to watch refcount increase for s in both cases. Not that it matters, it just works of course.
1

In order to complement @seberg 's comment:

    import ctypes
    import sys

    import numpy as np

    b = bytearray([1, 2, 3])
    b_addr = id(b)
    print(sys.getrefcount(b) - 1, ctypes.c_long.from_address(b_addr).value)  # => 1 1
    a1 = np.frombuffer(b, dtype=np.int8)
    assert b[0] == a1[0]
    b[0] = b[0] + 1
    assert b[0] == a1[0]
    print(sys.getrefcount(b) - 1, ctypes.c_long.from_address(b_addr).value)  # => 2 2
    a2 = np.frombuffer(b, dtype=np.int8)
    print(sys.getrefcount(b) - 1, ctypes.c_long.from_address(b_addr).value)  # => 3 3
    del a2
    print(sys.getrefcount(b) - 1, ctypes.c_long.from_address(b_addr).value)  # => 2 2
    del b
    print(ctypes.c_long.from_address(b_addr).value)  # => 1
    del a1
    print(ctypes.c_long.from_address(b_addr).value)  # => 0

sys.getrefcount(b) returns a higher value "because it includes the (temporary) reference as an argument to getrefcount()"

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.