I have a very large numpy array which I created and saved with no problem using
numpy.save('file.npy', NumpyArrays)
However, when I tried to load using
NumpyArrays = numpy.load('file.npy')
in the exact same enviroment (Google Colaboratory) my enviroment crashes due to lack of memory. I have tried restarting the enviroment so that it's fresh, and the only operation I attempt is loading the array, but it still crashes.
How does the enviroment in which the original numpy array is created and saved take up less memory than just trying to load that same array from memory?
I have tried using the answer here to save memory
Efficient way to partially read large numpy file?
by only opening in write mode, but I get this error
ValueError: Array can't be memory-mapped: Python objects in dtype.
I'm guessing this is because the second column is a list of integers.
If this is relevant, this is what my numpy variable looks like
numpyVariable[0:5]
array([[0, list([10158697, 5255434, 9860860, 3677049, 3451292, 7225330])],
[1,
list([5985929, 7356938, 5232932, 4623077, 10461651, 6629144, 2738221, 7672279, 3197654, 11678039, 1912097, 6581279, 8141689, 6694817, 6139889, 7946369, 3995629, 3169031, 3793217, 6990097, 11298098, 6120907, 5336712, 7366785, 7363171, 3933563, 6484209, 4243394, 6371367, 4361218, 11469370, 6166715, 11519607, 11602639, 10759034, 6432476, 5327726, 11390220, 7009744, 10225744, 3781058, 1305863, 462965, 1158562, 2620006, 73896, 4945223, 11780201, 3044821])],
[2, list([10847593, 8665775, 341568, 4164850, 6509965, 8227738])],
[3,
list([9105020, 1896456, 2757197, 5911741, 8123078, 10629261, 5646782, 5255907, 8802504, 3735293, 5496511, 1612181, 10029269, 8911733, 8035123, 4855475, 2226494, 10448630, 2041328, 532211, 10049766, 7320606, 7783187, 11536583, 9192742, 8965808, 7750786, 2462038, 111935, 4306882, 11193228])],
[4,
list([11406300, 9947761, 2539951, 1928472, 1286647, 1360522, 9680046, 1304518, 2577907, 5903319, 6304940, 8249558, 11156695, 5704721, 9753227, 465481, 8849435, 5040956, 8124190, 11094867, 9225419, 10531161, 3796335, 6660230, 823696, 3271428, 9167574])]],
dtype=object)
Since that could be tricky to interpret, here's the original pandas dataframe from which the numpy array was converted from (using df.values )
EmbedID MappedC
0 0 [10158697, 5255434, 9860860, 3677049, 3451292,...
1 1 [5985929, 7356938, 5232932, 4623077, 10461651,...
2 2 [10847593, 8665775, 341568, 4164850, 6509965, ...
3 3 [9105020, 1896456, 2757197, 5911741, 8123078, ...
4 4 [11406300, 9947761, 2539951, 1928472, 1286647,...
first column is a integer, second column is a list of integers.