Vectorizing matrix operations using numpy

Question

I wrote the code below that uses a for loop. I would like to ask if there is a way to vectorize the operation within the second for loop since I intend to work with larger matrices.

import numpy as np

num = 5

A = np.array([[1,2,3,4,5], [4,5,6,4,5], [7,8,9,4,5], [10,11,12,4,5], [13,14,15,4,5]])
sm_factor = np.array([0.1 ,0.1, 0.1, 0.1, 0.1])


d2m = np.zeros((num, num))
d2m[0, 0] = 2
d2m[0, 1] = -5
d2m[0, 2] = 4
d2m[0, 3] = -1

for k in range(1, num-1):
    d2m[k, k-1] = 1
    d2m[k, k] = -2
    d2m[k, k+1] = 1

d2m[num-1, num-4] = -1
d2m[num-1, num-3] = 4
d2m[num-1, num-2] = -5
d2m[num-1, num-1] = 2

x_smf  = 0
for i in range(len(sm_factor)):
    x_smf = x_smf + sm_factor[i] * (d2m @ (A[i, :]).T).T @ (d2m @ (A[i, :]).T)

x_smf
# 324.0

(((d2m @ A.T) * (d2m @ A.T)).sum(0) * sm_factor).sum(), if you have a different sm_factor for every iteration. Otherwise you can replace sm_factor with .1. — Michael Szczesny
– Michael Szczesny, Commented Jan 22, 2022 at 20:38
For larger arrays save a = d2m @ A.T to avoid redundant computations: ((a * a).sum(0) * .1).sum() — Michael Szczesny
– Michael Szczesny, Commented Jan 22, 2022 at 20:46

bousof · Accepted Answer · 2022-01-23 11:46:30Z

You can avoid loops for both d2m matrix creation and x_smf vector computation using sps.diags for the creation of a sparse tridiagonal matrix that you can cast to array to be able to edit the first and last lines. Your code will look like this (note that the result of diags in line 10 has been cast to a dense ndarray using scipy.sparse.dia_matrix.toarray method):

import numpy as np
import scipy.sparse as sps

# Dense tridiagonal matrix
d2m = sps.diags([1, -2, 1], [-1, 0, 1], shape=(num, num)).toarray() # cast to array
# First line boundary conditions
d2m[0, 0] = 2
d2m[0, 1] = -5
d2m[0, 2] = 4
d2m[0, 3] = -1
# Last line boundary conditions
d2m[num-1, num-4] = -1
d2m[num-1, num-3] = 4
d2m[num-1, num-2] = -5
d2m[num-1, num-1] = 2

The solution proposed by Valdi_Bo enables you to remove the second FOR loop:

x_smf = np.sum(sm_factor * np.square(d2m @ A.T).sum(axis=0))

However, I want to attract your attention on the fact that the x_smf matrix is sparse and storing it as a dense ndarray is bad for both computation time and memory storage. Instead of casting to dense ndarray, I advise you to cast to a sparse matrix format. For example lil_matrix, which is a list of lists sparse matrix format, using tolil() method instead of toarray():

# Sparse tridiagonal matrix
d2m_s = sps.diags([1, -2, 1], [-1, 0, 1], shape=(num, num)).tolil() # cast to lil

Here is a script that compares the three implementations on a bigger case num=4000 (for num=5 all give 324). For this size, I am already seeing benefits of using sparse matrix, here is the whole script (the first lines are a generalisation of the code for num different from 5):

from time import time
import numpy as np
import scipy.sparse as sps

num = 4000

A = np.concatenate([np.arange(1, (num-2)*num+1).reshape(num, num-2), np.repeat([[4, 5]], num, axis=0)], axis=1)
sm_factor = 0.1*np.ones(num)

########## DENSE matrix + FOR loop ##########

d2m = sps.diags([1, -2, 1], [-1, 0, 1], shape=(num, num)).toarray() # cast to array
# First line boundary conditions
d2m[0, 0] = 2
d2m[0, 1] = -5
d2m[0, 2] = 4
d2m[0, 3] = -1
# Last line boundary conditions
d2m[num-1, num-4] = -1
d2m[num-1, num-3] = 4
d2m[num-1, num-2] = -5
d2m[num-1, num-1] = 2

# FOR loop version
t_start = time()
x_smf  = 0
for i in range(len(sm_factor)):
    x_smf = x_smf + sm_factor[i] * (d2m @ (A[i, :]).T).T @ (d2m @ (A[i, :]).T)
print(f'FOR loop version time:           {time()-t_start}s')
print(f'FOR loop version value:          {x_smf}\n')

########## DENSE matrix + VECTORIZED ##########

t_start = time()
x_smf_v = np.sum(sm_factor * np.square(d2m @ A.T).sum(axis=0))
print(f'VECTORIZED version time:         {time()-t_start}s')
print(f'VECTORIZED version value:        {x_smf_v}\n')

########## SPARSE matrix + VECTORIZED ##########

d2m_s = sps.diags([1, -2, 1], [-1, 0, 1], shape=(num, num)).tolil() # cast to lil
# First line boundary conditions
d2m_s[0, 0] = 2
d2m_s[0, 1] = -5
d2m_s[0, 2] = 4
d2m_s[0, 3] = -1
# Last line boundary conditions
d2m_s[num-1, num-4] = -1
d2m_s[num-1, num-3] = 4
d2m_s[num-1, num-2] = -5
d2m_s[num-1, num-1] = 2

t_start = time()
x_smf_s = np.sum(sm_factor * np.square(d2m_s @ A.T).sum(axis=0))
print(f'SPARSE+VECTORIZED version time:  {time()-t_start}s')
print(f'SPARSE+VECTORIZED version value: {x_smf_s}\n')

Here is what I get when running the code:

FOR loop version time:           25.878241777420044s
FOR loop version value:          3.752317536763356e+17

VECTORIZED version time:         1.0873610973358154s
VECTORIZED version value:        3.752317536763356e+17

SPARSE+VECTORIZED version time:  0.37279224395751953s
SPARSE+VECTORIZED version value: 3.752317536763356e+17

As you can see the use of a sparse matrix makes you win another factor 3 on computation time and doesn't require you to adapt the code coming afterwards. It is also a good strategy to test the various scipy implementations of sparse matrices (tocsc(), tocsr(), todok() etc.), some may be more adapted to your case.

Valdi_Bo · Accepted Answer · 2022-01-22 21:37:26Z

1

After some research and printout of intermediate results of your loop, I found the solution:

x_smf = np.sum(sm_factor * np.square(d2m @ A.T).sum(axis=0))

The result is:

324.0

By the way: Creation of dm2 can be shortened to:

d2m = np.zeros((num, num), dtype='int')
d2m[0, :4] = [ 2, -5,  4, -1]
for k in range(1, num-1):
    d2m[k, k-1:k+2] = [ 1, -2,  1]
d2m[-1, -4:] = [-1, 4, -5, 2]

edited Jan 22, 2022 at 21:37

answered Jan 22, 2022 at 21:04

Valdi_Bo

31.1k4 gold badges29 silver badges45 bronze badges

3 Comments

Mad Physicist Over a year ago

Pretty sure you don't need the loop if you pay attention to the indices

richinex Over a year ago

@ Valdi_Bo the last line should be d2m[num-1, -4:] = [-1, 4, -5, 2]. index is -4: not 1:

Valdi_Bo Over a year ago

You're right. I changed also the way how the first index is expressed.

Collectives™ on Stack Overflow

Vectorizing matrix operations using numpy

2 Answers 2

1 Comment

3 Comments

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

3 Comments

Related