How to avoid memory mapping when loading a numpy file

Question

Csv file:

0,0,0,0,0,0,0,0,0,0.32,0.21,0,0.16,0,0,0,0,0,0,0.32
0,0,0,0,0,0,0.17,0,0.04,0,0,0.25,0.03,0.32,0,0.02,0.05,0.03,0.08,0
0.08,0.07,0.09,0.06,0,0,0.21,0.02,0,0,0,0,0,0,0,0.1,0.36,0,0,0
[goes on always 20 columns and x number of rows]

I'm saving the array this way:

with open(csv_profile) as csv_file:
    array = np.loadtxt(csv_file, delimiter=",",dtype='str')
npy_profile=open(outfile, "wb")
np.save(npy_profile, array)

Which is saved as u4 instead of f8 which is what I need.

I noticed this error in the datatype as the output file says

<93>NUMPY^A^@v^@{'descr': '<U4', 'fortran_order': False, 'shape': (680, 20), }

Also when I load it:

profile_matrix=np.load(npy_profile,"r")

the class type is numpy.memmap instead of numpy.ndarray. How can I avoid this issue?

Both saving it in the correct format and loading it in the correct format.

The np.load docs says, regarding the 2nd argument (if given): "if not None, then memory-map the file, using the given mode ". You specified 'r'. — hpaulj
– hpaulj, Commented Jan 31, 2022 at 17:03
Thanks! Do you know by any chance why it is saving as unsigned integers instead of floats? — Caterina
– Caterina, Commented Jan 31, 2022 at 17:05
How did you create the array? Is it actually a float array? And how did you notice that it is stored in the wrong format? — Jakob Stark
– Jakob Stark, Commented Jan 31, 2022 at 17:16
You can try to add a .astype(float) to array in your call to np.save. — Jakob Stark
– Jakob Stark, Commented Jan 31, 2022 at 17:24

Jakob Stark · Accepted Answer · 2022-01-31 17:04:27Z

Looking into the manual we can see that the second parameter of numpy.load is called mmap_mode and is set to "r" in your code. This enables memory mapping the file:

A memory-mapped array is kept on disk. However, it can be accessed and sliced like any ndarray. Memory mapping is especially useful for accessing small fragments of large files without reading the entire file into memory.

Memory mapping is normally not an "issue" as you called it, but a feature that enables faster file access and saves memory for large files. When doing memory mapped I/O, your operating system maps parts of the file into the RAM address space of your program. That way the data has not to be copied into RAM. Any changes that are made to the memory mapped numpy array are directly reflected in the file. Because you specified read only access, you probably cannot change values in the array.

If you want to disable memory mapping, you could remove the second argument "r" from the call to numpy.load, which leads to a fresh copy of the array in RAM, that you can modify without affecting the file.

jmd_dk · Accepted Answer · 2022-01-31 17:14:46Z

While the answer from Jakob Stark explains what the additional "r" argument to np.load() does, let me just suggest a simpler and safer usage. To save and load NumPy arrays in the straight-forward way (no memory mapping, etc.), use the most straight-forward syntax:

np.save('filename.npy', array)
array2 = np.load('filename.npy')

You don't have to specify the dtype or anything, it just does the simplest possible thing, as you are expecting. Also, not manually opening the file prior to calling np.save() means that you do not have to worry about closing it again (these acts should generally be written inside a try/except block, which further adds to the complexity).

Collectives™ on Stack Overflow

How to avoid memory mapping when loading a numpy file

2 Answers 2

Comments

Comments

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Related