107 questions
3
votes
2
answers
114
views
Expanding np.memmap
I have huge np.memmap objects and need to expand them regularly. I was wondering if my current approach is safe and also the most efficient one and started searching the internet. I stumbled across ...
1
vote
1
answer
115
views
Avoiding unnecessary caching of data when using numpy memmap
I have a program that reads through very large (~100GB-TB) binary files in chunks using numpy memmap. The program does a single pass over the data, so there is no need to cache anything since there is ...
1
vote
0
answers
103
views
Working with larger than memory data in numpy
I am working on a project that involves larger than memory numpy 3 dimensional arrays. The project will be deployed with AWS lambda. I am faced with two design choices
a) Re-write large parts of the ...
-1
votes
1
answer
98
views
Numpy memmap corrupts array
I use numpy-2.1.2-cp313-cp313-win_amd64. When I try to load an array via memmap, the array shape and data were corrupted. Minimum reproducible example is below:
>>> a = np.arange(65536)
>&...
0
votes
1
answer
170
views
How to index a numpy memmap without creating an in-memory copy?
Which indexing operations on numpy.memmap arrays return an in-memory copy vs a view that is still backed by a file? The documentation doesn't explain which indexing operations are "safe".
0
votes
1
answer
57
views
numpy.load using mmap_mode only work in vscode terminal
I am using numpy.load to load .npy data. The code is like this: self.data_memmaps = [np.load(path, mmap_mode='r') for path in data_paths]. If I run python script containing this code in vscode ...
0
votes
0
answers
56
views
np.save and np.load with memmap mode returned OSError
I tried this simple code:
import numpy as np
np.save('tmp.npy', np.empty(128))
tmp = np.load('tmp.npy', mmap_mode='r+')
np.save('tmp.npy', tmp[:64])
It returned OSError:
------------------------------...
0
votes
1
answer
114
views
Numpy memmap still using RAM instead of disk while doing vector operation
I initialize two operands and one result:
a = np.memmap('a.mem', mode='w+', dtype=np.int64, shape=(2*1024*1024*1024))
b = np.memmap('b.mem', mode='w+', dtype=np.int64, shape=(2*1024*1024*1024))
result ...
1
vote
0
answers
152
views
Load, process and save larger-than-memory array using dask
I have a very large covariance matrix (480,000 x 480,000) stored on disk in a binary format. I want to compute a corresponding whitening matrix, for which I need to compute the SVD of the covariance ...
1
vote
0
answers
93
views
How to perform operations on memory map without loading whole file into memory?
An approximately 4.3 GB memory map I want to take the log of without loading the whole thing into memory. Is there a way to assign cl that minimizes amount of memory used?
import numpy
import psutil
...
0
votes
1
answer
127
views
How do I apply changes to np.memmap with multiprocessing?
The current task at hand that I have requires multiple array manipulations that take longer than what is feasible. I am trying to utilize the multiprocessing package to accelerate the process, but I ...
1
vote
1
answer
379
views
What's the best approach to extend memmap'ed Numpy or Dask arrays (bigger than available ram)?
I have a Numpy array on disk, bigger than my available ram.
I can load it as a memory-map and use it without problem:
a = np.memmap(filename, mode='r', shape=shape, dtype=dtype)
Further on, I can ...
0
votes
0
answers
300
views
Efficient way to retrieve data from multiple numpy memmap files and creating a new array
for machine learning I need to get data from multiple large memmap files, combine them and return it.
The amount of variables (files) used are defined by the user.
At the moment I store the files in a ...
2
votes
1
answer
851
views
Numpy's memmap acting strangely?
I am dealing with large numpy arrays and I am trying out memmap as it could help.
big_matrix = np.memmap(parameters.big_matrix_path, dtype=np.float16, mode='w+', shape=(1000000, 1000000)
The above ...
0
votes
0
answers
197
views
Is there an optimized way to convert a numpy array to fortran order when using memmaps
I have a memmapped numpy array:
arr = np.load("a.npy", mmap_mode='r')
It is bigger than my memory. For my further computation I need it in fortran order instead of C. So I can use np....