Numpy array crashes when loaded from file due to memory error, but original numpy variable was created in the same enviroment

Ask Question

Asked 7 years ago

Modified 7 years ago

Viewed 716 times

I have a very large numpy array which I created and saved with no problem using

numpy.save('file.npy', NumpyArrays)

However, when I tried to load using

NumpyArrays = numpy.load('file.npy')

in the exact same enviroment (Google Colaboratory) my enviroment crashes due to lack of memory. I have tried restarting the enviroment so that it's fresh, and the only operation I attempt is loading the array, but it still crashes.

How does the enviroment in which the original numpy array is created and saved take up less memory than just trying to load that same array from memory?

I have tried using the answer here to save memory

Efficient way to partially read large numpy file?

by only opening in write mode, but I get this error

ValueError: Array can't be memory-mapped: Python objects in dtype.

I'm guessing this is because the second column is a list of integers.

If this is relevant, this is what my numpy variable looks like

numpyVariable[0:5]

array([[0, list([10158697, 5255434, 9860860, 3677049, 3451292, 7225330])],
       [1,
        list([5985929, 7356938, 5232932, 4623077, 10461651, 6629144, 2738221, 7672279, 3197654, 11678039, 1912097, 6581279, 8141689, 6694817, 6139889, 7946369, 3995629, 3169031, 3793217, 6990097, 11298098, 6120907, 5336712, 7366785, 7363171, 3933563, 6484209, 4243394, 6371367, 4361218, 11469370, 6166715, 11519607, 11602639, 10759034, 6432476, 5327726, 11390220, 7009744, 10225744, 3781058, 1305863, 462965, 1158562, 2620006, 73896, 4945223, 11780201, 3044821])],
       [2, list([10847593, 8665775, 341568, 4164850, 6509965, 8227738])],
       [3,
        list([9105020, 1896456, 2757197, 5911741, 8123078, 10629261, 5646782, 5255907, 8802504, 3735293, 5496511, 1612181, 10029269, 8911733, 8035123, 4855475, 2226494, 10448630, 2041328, 532211, 10049766, 7320606, 7783187, 11536583, 9192742, 8965808, 7750786, 2462038, 111935, 4306882, 11193228])],
       [4,
        list([11406300, 9947761, 2539951, 1928472, 1286647, 1360522, 9680046, 1304518, 2577907, 5903319, 6304940, 8249558, 11156695, 5704721, 9753227, 465481, 8849435, 5040956, 8124190, 11094867, 9225419, 10531161, 3796335, 6660230, 823696, 3271428, 9167574])]],
      dtype=object)

Since that could be tricky to interpret, here's the original pandas dataframe from which the numpy array was converted from (using df.values )

    EmbedID MappedC
0   0   [10158697, 5255434, 9860860, 3677049, 3451292,...
1   1   [5985929, 7356938, 5232932, 4623077, 10461651,...
2   2   [10847593, 8665775, 341568, 4164850, 6509965, ...
3   3   [9105020, 1896456, 2757197, 5911741, 8123078, ...
4   4   [11406300, 9947761, 2539951, 1928472, 1286647,...

first column is a integer, second column is a list of integers.

asked Oct 15, 2018 at 8:27

SantoshGupta7

6,32718 gold badges73 silver badges142 bronze badges

Add a comment |

0

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Collectives™ on Stack Overflow

Numpy array crashes when loaded from file due to memory error, but original numpy variable was created in the same enviroment

0

Linked

Hot Network Questions

Collectives™ on Stack Overflow

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Linked