Apply formula to get Numpy matrix efficiently

Question

I am trying to parse through 2 vectors, and fill a matrix based on a formula. This is the way I am doing it, it is highly inefficient.

import numpy as np

list1 = [1, 2, 3, 4]
list2 = [20, 30, 40, 50, 60, 70, 80, 90]

array1 = np.array(list1)
array2 = np.array(list2)

columns = len(list1)
rows = len(list2)

matrix = np.zeros((rows, columns))

for column in range(0, columns):
    for row in range(2*column, rows):
        matrix[row, column] = round(10 * (array2[row] - array1[column]), 0)

print(matrix)

The output should be

[[190.   0.   0.   0.]
 [290.   0.   0.   0.]
 [390. 380.   0.   0.]
 [490. 480.   0.   0.]
 [590. 580. 570.   0.]
 [690. 680. 670.   0.]
 [790. 780. 770. 760.]
 [890. 880. 870. 860.]]

This is an example, the real arrays are large. How can I use numpy built-in code to do this in the most efficient and optimized way?

Thank you

Julien · Accepted Answer · 2021-05-06 07:29:49Z

Check this:

list1 = list(range(1,50))
list2 = list(range(20,1000,10))
array1 = np.array(list1)
array2 = np.array(list2)

columns = len(list1)
rows = len(list2)

# yours
def f():
    matrix = np.zeros((rows, columns))
    for column in range(0, columns):
        for row in range(2*column, rows):
            matrix[row, column] = round(10 * (array2[row] - array1[column]), 0)
    return matrix

def g():
    col, row = np.meshgrid(np.arange(columns), np.arange(rows))
    mask = row>=2*col
    matrix = np.where(mask, np.round(10*(array2[:,None] - array1), 0), 0)
    return matrix

col, row = np.meshgrid(np.arange(columns), np.arange(rows))
mask = row>=2*col
def h():
    matrix = np.where(mask, np.round(10*(array2[:,None] - array1), 0), 0)
    return matrix


import timeit

%timeit f()
# 3.18 ms ± 246 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%timeit g()
# 64.7 µs ± 1.2 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

%timeit h()
# 21.7 µs ± 1.69 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

f, g and h are comparable on your small array but then g and h get much faster on bigger ones.

h is a further optimization on g if you need to do the same sort of computation several times on same size lists, since you can compute your mask only once...

Thank you for the answer, this is a huge improvement on time. Before It would take like 20 seconds to compute the real data, now its less than 1 sec.

RandomGuy · Accepted Answer · 2021-05-06 09:54:43Z

The answer above is much more detailed, but I still wanted to present this concise solution, easier to understand. It relies on the mathematical formulation of your problem, since you only want to substract two 1D-arrays at each coordinate of your matrix, an easy way to do so is to "transform" your 1D-arrays into well written 2D ones with np.tile, which you then only have to substract.

Finally, to select only the coordinates verifying row >= 2*column, you can create a boolean array (y >= 2*x), which when multiplied by your previous array C, will put to 0 all coordinates that do not check this condition.

a = np.array([1, 2, 3, 4])
b = np.array([20, 30, 40, 50, 60, 70, 80, 90])
n = len(a)
m = len(b)

A = np.tile(a, (m,1))
B = np.tile(b, (n,1)).T

'''
At this point, we have :
A = [[1 2 3 4]
     [1 2 3 4]
     [1 2 3 4]
     [1 2 3 4]
     [1 2 3 4]
     [1 2 3 4]
     [1 2 3 4]
     [1 2 3 4]]
and
B = [[20 20 20 20]
     [30 30 30 30]
     [40 40 40 40]
     [50 50 50 50]
     [60 60 60 60]
     [70 70 70 70]
     [80 80 80 80]
     [90 90 90 90]]

so by definition matrix is exactly 10 * (B-A) ! 
It is really easy to see by writting down a bit of maths, 
since matrix[i,j] = 10 * (b[i] - a[j]).
'''

C = np.round(10*(B-A), 0)

x,y = np.meshgrid(np.arange(n), np.arange(m))

matrix = C * (y >= 2*x)

Why use separate ways to make the 2d arrays, tile and meshgrid? Can't you use the same method twice? How about the sparse version of meshgrid? Or broadcasting?
This was the easiest solution that came to my mind when doing this (in fact, I saw the condition on rows and columns only after finishing the array C), and I haven't thought of anything else to produce a cleaner result. But if you have a better solution, feel free to share it ! It could benefit me as well :)

Collectives™ on Stack Overflow

Apply formula to get Numpy matrix efficiently

2 Answers 2

1 Comment

2 Comments

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

2 Comments

Related