Finding shape of saved numpy array (.npy or .npz) without loading into memory

Question

I have a huge compressed numpy array saved to disk (~20gb in memory, much less when compressed). I need to know the shape of this array, but I do not have the available memory to load it. How can I find the shape of the numpy array without loading it into memory?

John Zwinck · Accepted Answer · 2017-04-05 06:28:02Z

20

This does it:

import numpy as np
import zipfile

def npz_headers(npz):
    """Takes a path to an .npz file, which is a Zip archive of .npy files.
    Generates a sequence of (name, shape, np.dtype).
    """
    with zipfile.ZipFile(npz) as archive:
        for name in archive.namelist():
            if not name.endswith('.npy'):
                continue

            npy = archive.open(name)
            version = np.lib.format.read_magic(npy)
            shape, fortran, dtype = np.lib.format._read_array_header(npy, version)
            yield name[:-4], shape, dtype

answered Apr 5, 2017 at 6:28

John Zwinck

252k44 gold badges346 silver badges459 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Aristides Over a year ago

This answer is perfect and should really be the accepted one...

Community · Accepted Answer · 2017-05-23 12:26:23Z

8

Opening the file in mmap_mode might do the trick.

    If not None, then memory-map the file, using the given mode
    (see `numpy.memmap` for a detailed description of the modes).
    A memory-mapped array is kept on disk. However, it can be accessed
    and sliced like any ndarray.  Memory mapping is especially useful for
    accessing small fragments of large files without reading the entire
    file into memory.

It is also possible to read the header block without reading the data buffer, but that requires digging further into the underlying lib/npyio/format code. I explored that in a recent SO question about storing multiple arrays in a single file (and reading them).

https://stackoverflow.com/a/35752728/901925

edited May 23, 2017 at 12:26

CommunityBot

11 silver badge

answered Mar 14, 2016 at 16:42

hpaulj

233k14 gold badges260 silver badges392 bronze badges

3 Comments

John Zwinck Over a year ago

This works for .npy but not .npz. I don't think mmap is at all useful with .npz--certainly not if the data are compressed aka np.savez_compressed().

hpaulj Over a year ago

Doing any of this with the npz archive will require digging into that branch of the loader, np.lib.npyio.NpzFile. Key file format information is in np.lib.npyio.format

John Zwinck Over a year ago

Indeed. I've implemented it in an answer here.

Collectives™ on Stack Overflow

Finding shape of saved numpy array (.npy or .npz) without loading into memory

2 Answers 2

1 Comment

3 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

3 Comments

Linked

Related