Numpy optimization

Question

I have a function that assigns value depending on the condition. My dataset size is usually in the range of 30-50k. I am not sure if this is the correct way to use numpy but when it's more than 5k numbers, it gets really slow. Is there a better way to make it faster ?

import numpy as np 
N = 5000; #dataset size
L = N/2;
d=0.1; constant = 5;

x=constant+d*np.random.random(N);

matrix = np.zeros([L,N]);

print "Assigning matrix"
for k in xrange(L):
    for i in xrange(k+1):
        matrix[k,i] = random.random()

    for i in xrange(k+1,N-k-1):
        if ( x[i] > x[i-k-1] ) and ( x[i] > x[i+k+1] ):
            matrix[k,i] = 0
        else:
            matrix[k,i] = random.random()

    for i in xrange(N-k-1,N):
        matrix[k,i] = random.random()

Adam Hughes · Accepted Answer · 2015-01-18 22:14:09Z

3

If you are using for loops, you are going to lose the speed in numpy. The way to get speed is to use numpys functions and vectorized operations. Is there a way you can create a random matrix:

matrix = np.random.randn(L,k+1)

Then do something to this matrix to get the 0's positioned you want? Can you elaborate on the condition for setting an entry to 0? For example, you can make the matrix then do:

matrix[matrix > value]

To retain all values above a threshold. If the condition can be expressed as some boolean indexer or arithmetic operation, you can speed it up. If it has to be in the for loop (ie it depends on the values surrounding it as the loop cycles) it may not be able to be vectorized.

answered Jan 18, 2015 at 22:14

Adam Hughes

16.5k14 gold badges100 silver badges140 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

chad Over a year ago

Thanks for your reply Adam, you are right, I could probably make a random matrix first then assign the 0 entries to the matrix. Is that possible to do it with this after assigning the random matrix ? -> matrix[x[i] > x[i-k-1] ) and ( x[i] > x[i+k+1]] = 0 ? The condition for assigning the 0 entries is basically what you have seen there. If the current element is larger than the previous and the element in the next row, then assign to 0.

Adam Hughes Over a year ago

So the assignment of the zeros has to depend on the neighboring values of the array? If there's no way to kind of get around that, I think what people usually do is they would go to a library like cython and just do a for-loop like you've done that runs at the c-level for speed. AFAIK, you can't speed up an operation that has to iterate item by item. If you outline the reasoning for this operation, maybe there's a vectorized way to do it that someone else would know of.

Sven Marnach Over a year ago

It should be possible to construct the mask in pure Numpy using some tricks, but the code will likely become hard to understand. As Adam said, using Cython or ctypes instead is probably your better bet.

Collectives™ on Stack Overflow

Numpy optimization

1 Answer 1

3 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Linked

Related