how to implement this array algorithm in a more efficient way?

Question

Assuming I have n = 3 lists of same length for example:

R1 = [7,5,8,6,0,6,7]

R2 = [8,0,2,2,0,2,2]

R3 = [1,7,5,9,0,9,9]

I need to find the first index t that verifies the n = 3 following conditions for a period p = 2. Edit: the meaning of period p is the number of consecutive "boxes".

R1[t] >= 5, R1[t+1] >= 5. Here t +p -1 = t+1, we need to only verify for two boxes t and t+1. If p was equal to 3 we will need to verify for t, t+1 and t+2. Note that It's always the same number for which we test, we always test if it's greater than 5 for every index. The condition is always the same for all the "boxes".
R2[t] >= 2, R2[t+1] >= 2
R3[t] >= 9, R3[t+1] >= 9

In total there is 3 * p conditions.

Here the t I am looking for is 5 (indexing is starting from 0).

The basic way to do this is by looping on all the indexes using a for loop. If the condition is found for some index t we store it in some local variable temp and we verify the conditions still hold for every element whose index is between t+1 and t+p -1. If while checking, we find an index that does not satisfy a condition, we forget about the temp and we keep going.

What is the most efficient way to do this in Python if I have large lists (like of 10000 elements)? Is there a more efficient way than the for loop?

Show an example and mcve. You're looking for more efficient, but I don't see a first order implementation — Mad Physicist
– Mad Physicist, Commented Jul 5, 2020 at 23:25

paime · Accepted Answer · 2020-07-06 09:29:49Z

Since all your conditions are the same (>=), we could leverage this.

This solution will work for any number of conditions and any size of analysis window, and no for loop is used.

You have an array:

>>> R = np.array([R1, R2, R3]).T                                                                                                                                                                         
>>> R
array([[7, 8, 1],
       [5, 0, 7],
       [8, 2, 5],
       [6, 2, 9],
       [0, 0, 0],
       [6, 2, 9],
       [7, 2, 9]]

and you have thresholds:

>>> thresholds = [5, 2, 9]

So you can check where the conditions are met:

>>> R >= thresholds
array([[ True,  True, False],
       [ True, False, False],
       [ True,  True, False],
       [ True,  True,  True],
       [False, False, False],
       [ True,  True,  True],
       [ True,  True,  True]])

And where they all met at the same time:

>>> R_cond = np.all(R >= thresholds, axis=1)
>>> R_cond
array([False, False, False,  True, False,  True,  True])

From there, you want the conditions to be met for a given window.

We'll use the fact that booleans can sum together, and convolution to apply the window:

>>> win_size = 2
>>> R_conv = np.convolve(R_cond, np.ones(win_size), mode="valid")
>>> R_conv
array([0., 0., 1., 1., 1., 2.])

The resulting array will have values equal to win_size at the indices where all conditions are met on the window range.

So let's retrieve the first of those indices:

>>> index = np.where(R_conv == win_size)[0][0]
>>> index
5

If such an index doesn't exist, it will raise an IndexError, I'm letting you handle that.

So, as a one-liner function, it gives:

def idx_conditions(arr, thresholds, win_size, condition):
    return np.where(
        np.convolve(
            np.all(condition(arr, thresholds), axis=1),
            np.ones(win_size),
            mode="valid"
        )
        == win_size
    )[0][0]

I added the condition as an argument to the function, to be more general.

>>> from operator import ge
>>> idx_conditions(R, thresholds, win_size, ge)
5

Hi @paime and welcome! Thanks for the detailed answer, it's awesome :) I have a question, what if I need to pass some minimal_index to the function idx_conditions and I will look for the first_index greater than the minimal_index such that the conditions are satisfied. I thinking about a while loop but something needs to vary from one iteration to another.
Hi @JoffreyL., you're welcome. To implement a minimal_index, just add it as an optional parameter : def idx_conditions(..., minimal_index=0), then add arr = arr[minimal_index:] as the first line of the function, and eventually add ... + minimal_index to the returned value.
Hi @paime, thank you :) Assume that the method couldn't found the index, how to make it return the index of the minimum possible time window for which the conditions are verified?
@JoffreyL. You can make recursive call. Just try/catch when you get the index: try: np.where(...)[0][0] except IndexError: return idx_conditions(... win_size-1) if win_size > 1 else False. Or make it a while instead of recursive, to avoid re-evaluation of the np.all() line. Also, if you do it like this it means that minimal_index is stronger than the win_size. So this function will begin to be tricky, you may want to split it.
@paime Any hint on how to split it? The problem with the idea is that it's not sufficient if the win_size = 100 but the only possible window is for 2 periods, you will have tens of "useless" iterations

MrNobody33 · Accepted Answer · 2020-07-05 23:04:04Z

This could be a way:

R1 = [7,5,8,6,0,6,7]

R2 = [8,0,2,2,0,2,2]

R3 = [1,7,5,9,0,9,9]

for i,inext in zip(range(len(R1)),range(len(R1))[1:]):
    if (R1[i]>=5 and R1[inext]>=5)&(R2[i]>=2 and R2[inext]>=2)&(R3[i]>=9 and R3[inext]>=9):
        print(i)

Output:

Edit: Generalization could be:

def foo(ls,conditions):
    index=0
    for i,inext in zip(range(len(R1)),range(len(R1))[1:]):
        if all((ls[j][i]>=conditions[j] and ls[j][inext]>=conditions[j])  for j in range(len(ls))):
            index=i
    return index


R1 = [7,5,8,6,0,6,7]

R2 = [8,0,2,2,0,2,2]

R3 = [1,7,5,9,0,9,9]

R4 = [1,7,5,9,0,1,1]

R5 = [1,7,5,9,0,3,3]



conditions=[5,2,9,1,3]
ls=[R1,R2,R3,R4,R5]

print(foo(ls,conditions))

Output:

And, maybe if the arrays match the conditions multiple times, you could return a list of the indexes:

def foo(ls,conditions):
    index=[]
    for i,inext in zip(range(len(R1)),range(len(R1))[1:]):
        if all((ls[j][i]>=conditions[j] and ls[j][inext]>=conditions[j])  for j in range(len(ls))):
            print(i)
            index.append(i)
    return index


R1 = [6,7,8,6,0,6,7]

R2 = [2,2,2,2,0,2,2]

R3 = [9,9,5,9,0,9,9]

R4 = [1,1,5,9,0,1,1]

R5 = [3,3,5,9,0,3,3]

conditions=[5,2,9,1,3]
ls=[R1,R2,R3,R4,R5]

print(foo(ls,conditions))

Output:

[0,5]

Thanks :), I will test it against the basic version I described. How could we generalize it if there is N resources instead of 3 with N as input?
yes but this N is not known in advance. I mean I cannot write if (R1[i]>=a and R1[inext]>=a)&(R2[i]>=b and R2[inext]>=b)&(R3[i]>=c and R3[inext]>=c) ... (RN[i]>=c and RN[inext]>=c) Those ...does not exist. N could be 3, 6, 1, whatever.
your method seems to be the most efficient, I am trying to find a way to generalize it
thank you :) Do you have an idea why your solution is more efficient than both mine and the one proposed by @asalimih? I assume this question needs some under-the-hood knowledge of python.

asalimih · Accepted Answer · 2020-07-05 23:11:56Z

Here is a solution using numpy ,without for loops:

import numpy as np
R1 = np.array([7,5,8,6,0,6,7])
R2 = np.array([8,0,2,2,0,2,2])
R3 = np.array([1,7,5,9,0,9,9])
a = np.logical_and(np.logical_and(R1>=5,R2>=2),R3>=9)
np.where(np.logical_and(a[:-1],a[1:]))[0].item()

ouput

Edit:
Generalization
Say you have a list of lists R and a list of conditions c:

R = [[7,5,8,6,0,6,7],
     [8,0,2,2,0,2,2],
     [1,7,5,9,0,9,9]]
c = [5,2,9]

First we convert them to numpy arrays. the reshape(-1,1) converts c to a column matrix so that we can use pythons broadcasting feature in the >= operator

R = np.array(R)
c = np.array(c).reshape(-1,1)
R>=c

output:
array([[ True,  True,  True,  True, False,  True,  True],
       [ True, False,  True,  True, False,  True,  True],
       [False, False, False,  True, False,  True,  True]])

then we perform logical & operation between all rows using reduce function

a = np.logical_and.reduce(R>=c)
a
output:
array([False, False, False,  True, False,  True,  True])

next we create two arrays by removing first and last element of a and perform a logical & between them which shows which two subsequent elements satisfied the conditions in all lists:

np.logical_and(a[:-1],a[1:])
output:
array([False, False, False, False, False,  True])

now np.where just shows the index of the True element

np.where(np.logical_and(a[:-1],a[1:]))[0].item()
output:
5

Thanks :), any idea how to generalize it for N lists with N as input (it could be 3, 6, 1, whatever)?
Thanks for the detailed explanation :) Do you have an idea why your solution is more efficient than mine but less efficient than the one suggested by @MrNobody33. I assume this question needs some under-the-hood knowledge of python.
in terms of performance and run time my solution would be much better in comparison to for loop solutions in larger lists because of vectorized operations.
Yup, I 've just verified it. How about the scaling not by list size but the number of lists? Does the same argument still hold?

Collectives™ on Stack Overflow

how to implement this array algorithm in a more efficient way?

3 Answers 3

6 Comments

8 Comments

5 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

6 Comments

8 Comments

5 Comments

Linked

Related