My question is simple; and I could not find a resource that answers it. Somewhat similar links are using asarray, on numbers in general, and the most succinct one here.
How can I "calculate" the overhead of loading a numpy array into RAM (if there is any overhead)? Or, how to determine the least amount of RAM needed to hold all arrays in memory (without time-consuming trial and error)?
In short, I have several numpy arrays of shape (x, 1323000, 1), with x being as high as 6000. This leads to a disk usage of 30GB for the largest file.
All files together need 50GB. Is it therefore enough if I use slightly more than 50GB as RAM (using Kubernetes)? I want to use the RAM as efficiently as possible, so just using 100GBs is not an option.
(x, 1323000, 1)you need to know is the type (np.float,np.int,np.float32, etc) and the number of bytes used for that type (eg, usefinfooriinfo)?x*1323000*8- the total number of elements times the size of each (typically 8 bytes). Overhead is tiny. However, use of the array(s) may produce copies, permanent or temporary. So in practice you probably need 2-3x as much memory.