0

Is there a function which allows me to quickly compare and set values in a numpy array against a fixed value?

E.g., assume I have an array with numerical values like this:

0 0 0 3 7 3 0 0 0

I'd like to say: from index position [3 to index position [7, set the value to 5 if it is lower than 5. The result would be this:

0 0 0 5 7 5 5 0 0

The reason I'm asking is because when doing this operation "by hand" in a loop, things seem to be superslow. E.g., the following code takes ~90s to perform 1 million times such an operation on 64 consecutive elements in a 1 million element array:

import numpy as np
import random

tsize = 1000000
arr = np.zeros(tsize, dtype=np.uint32)

for rounds in range(tsize):
    num = random.randint(1, 123456)        # generate a random number
    apos = random.randint(0, tsize - 64)   # a random position
    for kpos in range(apos, apos + 64):    # loop to compare and set 64 elements
        if arr[kpos] < num:
            arr[kpos] = num

If there is not such a function: are there any obvious NumPy newbie mistakes in the code above which slow it down?

2 Answers 2

2

The for loop can be replaced with a slice and assignment, like so:

arr[apos:apos+64] = np.clip(arr[apos:apos+64], a_min=num, a_max=None)

Can also use np.maximum:

arr[apos:apos+64] = np.maximum(arr[apos:apos+64], num)

Timing:

import numpy as np
import random
​
tsize = 1000
arr = np.zeros(tsize, dtype=np.uint32)

%%timeit
for rounds in range(tsize):
    num = random.randint(1, 123456)        # generate a random number
    apos = random.randint(0, tsize - 64)   # a random position
    for kpos in range(apos, apos + 64):    # loop to compare and set 64 elements
        if arr[kpos] < num:
            arr[kpos] = num
# 10 loops, best of 3: 107 ms per loop

%%timeit
for rounds in range(tsize):
    num = random.randint(1, 123456)        # generate a random number
    apos = random.randint(0, tsize - 64)   # a random position
    arr[apos:apos+64] = np.clip(arr[apos:apos+64], a_min=num, a_max=None)
# 100 loops, best of 3: 4.14 ms per loop

%%timeit
for rounds in range(tsize):
    num = random.randint(1, 123456)        # generate a random number
    apos = random.randint(0, tsize - 64)   # a random position
    arr[apos:apos+64] = np.maximum(arr[apos:apos+64], num)
# 100 loops, best of 3: 4.13 ms per loop

# @Alexander's soln
%%timeit
for rounds in range(tsize):
    num = random.randint(1, 123456)        # generate a random number
    apos = random.randint(0, tsize - 64)   # a random position
    arr[apos:apos+64] = arr[apos:apos+64].clip(min=num)
# 100 loops, best of 3: 3.69 ms per loop
Sign up to request clarification or add additional context in comments.

3 Comments

Could you please also test Alexanders version (in a full executable, not with timeit) on your machine and, if like me you found out that it is quicker than your version, expand your answer? On my machine, your clip version took ~80s, his version took 60s, though basically the same function.
Sure. I just added the timing here. Your observation confirmed. It seems np.clip has a little bit more overhead than the instance clip method.
@Psidom Of course, the function calls will always add some overhead when compared to the array attribute based clip approach
2

You can use clip with array indexing.

a = np.array([0, 0, 0, 3, 7, 3, 0, 0, 0])
a[3:7] = a[3:7].clip(min=5)
>>> a
array([0, 0, 0, 5, 7, 5, 5, 0, 0])

1 Comment

I would so much love to also accept your answer. Your version is noticeably faster than his, but you were a couple of seconds later,

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.