1

I have a very large numpy array which I created and saved with no problem using

numpy.save('file.npy', NumpyArrays)

However, when I tried to load using

NumpyArrays = numpy.load('file.npy')

in the exact same enviroment (Google Colaboratory) my enviroment crashes due to lack of memory. I have tried restarting the enviroment so that it's fresh, and the only operation I attempt is loading the array, but it still crashes.

How does the enviroment in which the original numpy array is created and saved take up less memory than just trying to load that same array from memory?

I have tried using the answer here to save memory

Efficient way to partially read large numpy file?

by only opening in write mode, but I get this error

ValueError: Array can't be memory-mapped: Python objects in dtype.

I'm guessing this is because the second column is a list of integers.

If this is relevant, this is what my numpy variable looks like

numpyVariable[0:5]

array([[0, list([10158697, 5255434, 9860860, 3677049, 3451292, 7225330])],
       [1,
        list([5985929, 7356938, 5232932, 4623077, 10461651, 6629144, 2738221, 7672279, 3197654, 11678039, 1912097, 6581279, 8141689, 6694817, 6139889, 7946369, 3995629, 3169031, 3793217, 6990097, 11298098, 6120907, 5336712, 7366785, 7363171, 3933563, 6484209, 4243394, 6371367, 4361218, 11469370, 6166715, 11519607, 11602639, 10759034, 6432476, 5327726, 11390220, 7009744, 10225744, 3781058, 1305863, 462965, 1158562, 2620006, 73896, 4945223, 11780201, 3044821])],
       [2, list([10847593, 8665775, 341568, 4164850, 6509965, 8227738])],
       [3,
        list([9105020, 1896456, 2757197, 5911741, 8123078, 10629261, 5646782, 5255907, 8802504, 3735293, 5496511, 1612181, 10029269, 8911733, 8035123, 4855475, 2226494, 10448630, 2041328, 532211, 10049766, 7320606, 7783187, 11536583, 9192742, 8965808, 7750786, 2462038, 111935, 4306882, 11193228])],
       [4,
        list([11406300, 9947761, 2539951, 1928472, 1286647, 1360522, 9680046, 1304518, 2577907, 5903319, 6304940, 8249558, 11156695, 5704721, 9753227, 465481, 8849435, 5040956, 8124190, 11094867, 9225419, 10531161, 3796335, 6660230, 823696, 3271428, 9167574])]],
      dtype=object)

Since that could be tricky to interpret, here's the original pandas dataframe from which the numpy array was converted from (using df.values )

    EmbedID MappedC
0   0   [10158697, 5255434, 9860860, 3677049, 3451292,...
1   1   [5985929, 7356938, 5232932, 4623077, 10461651,...
2   2   [10847593, 8665775, 341568, 4164850, 6509965, ...
3   3   [9105020, 1896456, 2757197, 5911741, 8123078, ...
4   4   [11406300, 9947761, 2539951, 1928472, 1286647,...

first column is a integer, second column is a list of integers.

0

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.