0

Hi I have a simple line that creates a random array for a rather large dataset:

import numpy as np
import random
N=276233
L=138116

np.random.random([L,N])

But i get this error:

Traceback (most recent call last):
  File "<string>", line 3 (23), in <module>
  File "mtrand.pyx", line 760, in mtrand.RandomState.random_sample (numpy\random\mtrand\mtrand.c:5713)
  File "mtrand.pyx", line 137, in mtrand.cont0_array (numpy\random\mtrand\mtrand.c:1300)
MemoryError

What is the solution and what is the limit of the array ?

1
  • 1
    If you can use a smaller integer type rather than doubles you could reduce the memory foot print by quite a bit. However, depending on the goals of your analysis / data this may not be possible. Commented Jan 19, 2015 at 20:40

1 Answer 1

9

You are trying to create an array that would require 284GB of memory:

In [16]: L * N * 8 / (1024. ** 3)
Out[16]: 284.25601890683174

Either buy a lot more RAM (and make sure your system can handle it) or find a way to not have to generate a 276,233x138,116 matrix.

Sign up to request clarification or add additional context in comments.

5 Comments

Hmm...How did you get the number 284GB ? I have N*L=3.81x10^10 bits / 8 bits/Byte = 4.7GB ? Am i wrong?
276233 * 138116 * 8 / (1024 ^ 3). Each double uses 8 bytes.
Ah okay thanks! I thought each they're a bit each. :( I guess I can't use this method for large dataset which requires a random matrix ...
@Arbitel: Are you sure you need to generate the entire matrix at once?
It goes back to this question I posted earlier : stackoverflow.com/questions/28015281/numpy-optimization I basically had to compare the neighbouring elements. After that I would have to do a row-wise summation. I guess I have to?

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.