2

I have a function that assigns value depending on the condition. My dataset size is usually in the range of 30-50k. I am not sure if this is the correct way to use numpy but when it's more than 5k numbers, it gets really slow. Is there a better way to make it faster ?

import numpy as np 
N = 5000; #dataset size
L = N/2;
d=0.1; constant = 5;

x=constant+d*np.random.random(N);

matrix = np.zeros([L,N]);

print "Assigning matrix"
for k in xrange(L):
    for i in xrange(k+1):
        matrix[k,i] = random.random()

    for i in xrange(k+1,N-k-1):
        if ( x[i] > x[i-k-1] ) and ( x[i] > x[i+k+1] ):
            matrix[k,i] = 0
        else:
            matrix[k,i] = random.random()

    for i in xrange(N-k-1,N):
        matrix[k,i] = random.random()

1 Answer 1

3

If you are using for loops, you are going to lose the speed in numpy. The way to get speed is to use numpys functions and vectorized operations. Is there a way you can create a random matrix:

matrix = np.random.randn(L,k+1)

Then do something to this matrix to get the 0's positioned you want? Can you elaborate on the condition for setting an entry to 0? For example, you can make the matrix then do:

matrix[matrix > value]

To retain all values above a threshold. If the condition can be expressed as some boolean indexer or arithmetic operation, you can speed it up. If it has to be in the for loop (ie it depends on the values surrounding it as the loop cycles) it may not be able to be vectorized.

Sign up to request clarification or add additional context in comments.

3 Comments

Thanks for your reply Adam, you are right, I could probably make a random matrix first then assign the 0 entries to the matrix. Is that possible to do it with this after assigning the random matrix ? -> matrix[x[i] > x[i-k-1] ) and ( x[i] > x[i+k+1]] = 0 ? The condition for assigning the 0 entries is basically what you have seen there. If the current element is larger than the previous and the element in the next row, then assign to 0.
So the assignment of the zeros has to depend on the neighboring values of the array? If there's no way to kind of get around that, I think what people usually do is they would go to a library like cython and just do a for-loop like you've done that runs at the c-level for speed. AFAIK, you can't speed up an operation that has to iterate item by item. If you outline the reasoning for this operation, maybe there's a vectorized way to do it that someone else would know of.
It should be possible to construct the mask in pure Numpy using some tricks, but the code will likely become hard to understand. As Adam said, using Cython or ctypes instead is probably your better bet.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.