4

When I try and save a very large (20000 x 20000 element) array, I get all zeros back:

In [2]: shape = (2e4,)*2

In [3]: r = np.random.randint(0, 10, shape)

In [4]: r.tofile('r.data')

In [5]: ls -lh r.data
-rw-r--r--  1 whg  staff   3.0G 23 Jul 16:18 r.data

In [6]: r[:6,:6]
Out[6]:
array([[6, 9, 8, 7, 4, 4],
       [5, 9, 5, 0, 9, 4],
       [6, 0, 9, 5, 7, 6],
       [4, 0, 8, 8, 4, 7],
       [8, 3, 3, 8, 7, 9],
       [5, 6, 1, 3, 1, 4]])

In [7]: r = np.fromfile('r.data', dtype=np.int64)

In [8]: r = r.reshape(shape)

In [9]: r[:6,:6]
Out[9]:
array([[0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0]])

np.save() does similar strange things.

After searching the net, I found that there is a known bug in OSX:

https://github.com/numpy/numpy/issues/2806

When I try to to read the the tostring() data from a file using Python's read(), I get a memory error.

Is there a better way of doing this? Can anyone recommend a pragmatic workaround to this problem?

1 Answer 1

1

Use mmap to memory-map the file, and np.frombuffer to create an array that points into the buffer. Tested on x86_64 Linux:

# `r.data` created as in the question
>>> import mmap
>>> with open('r.data') as f:
...   m = mmap.mmap(f.fileno(), 0, mmap.MAP_SHARED, mmap.PROT_READ)
... 
>>> r = np.frombuffer(m, dtype='int64')
>>> r = r.reshape(shape)
>>> r[:6, :6]
array([[7, 5, 9, 5, 3, 5],
       [2, 7, 2, 6, 7, 0],
       [9, 4, 8, 2, 5, 0],
       [7, 2, 4, 6, 6, 7],
       [2, 9, 2, 2, 2, 6],
       [5, 2, 2, 6, 1, 5]])

Note that here r is a view of memory-mapped data, which makes it more memory-efficient, but comes with the side effect of automatically picking up changes to the file contents. If you want it to point to a private copy of the data, as the array returned by np.fromfile does, add an r = np.copy(r).

(Also, as written, this won't run under Windows, which requires slightly different mmap flags.)

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.