Overhead of loading large numpy arrays

My question is simple; and I could not find a resource that answers it. Somewhat similar links are using asarray, on numbers in general, and the most succinct one here.

How can I "calculate" the overhead of loading a numpy array into RAM (if there is any overhead)? Or, how to determine the least amount of RAM needed to hold all arrays in memory (without time-consuming trial and error)?

In short, I have several numpy arrays of shape (x, 1323000, 1), with x being as high as 6000. This leads to a disk usage of 30GB for the largest file.

All files together need 50GB. Is it therefore enough if I use slightly more than 50GB as RAM (using Kubernetes)? I want to use the RAM as efficiently as possible, so just using 100GBs is not an option.

asked Nov 11, 2020 at 15:33

emil

3432 silver badges12 bronze badges

1

Loading the data from hard disk is much slower than putting the data into ram. Therefore, the time spent on putting the data into ram is not important.

Crawl Cycle
– Crawl Cycle

2020-11-11 15:54:13 +00:00
Commented Nov 11, 2020 at 15:54
2

numpy arrays usually have a header and a block of data memory. The header is small and constant size, and, to answer your question, for simple numpy dtypes the size of the block of data can be directly calculated as (the number of bytes per element)*(the number of elements). That is, to say how large the array (x, 1323000, 1) you need to know is the type (np.float, np.int, np.float32, etc) and the number of bytes used for that type (eg, use finfo or iinfo)?

tom10
– tom10

2020-11-11 16:01:49 +00:00
Commented Nov 11, 2020 at 16:01
1

The memory use of a numpy array is easy to estimate, x*1323000*8 - the total number of elements times the size of each (typically 8 bytes). Overhead is tiny. However, use of the array(s) may produce copies, permanent or temporary. So in practice you probably need 2-3x as much memory.

hpaulj
– hpaulj

2020-11-11 16:35:25 +00:00
Commented Nov 11, 2020 at 16:35

Add a comment |

0

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Collectives™ on Stack Overflow

Overhead of loading large numpy arrays

0

Linked

Hot Network Questions

Collectives™ on Stack Overflow

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Linked