You can avoid loops for both d2m matrix creation and x_smf vector computation using sps.diags for the creation of a sparse tridiagonal matrix that you can cast to array to be able to edit the first and last lines. Your code will look like this (note that the result of diags in line 10 has been cast to a dense ndarray using scipy.sparse.dia_matrix.toarray method):
import numpy as np
import scipy.sparse as sps
# Dense tridiagonal matrix
d2m = sps.diags([1, -2, 1], [-1, 0, 1], shape=(num, num)).toarray() # cast to array
# First line boundary conditions
d2m[0, 0] = 2
d2m[0, 1] = -5
d2m[0, 2] = 4
d2m[0, 3] = -1
# Last line boundary conditions
d2m[num-1, num-4] = -1
d2m[num-1, num-3] = 4
d2m[num-1, num-2] = -5
d2m[num-1, num-1] = 2
The solution proposed by Valdi_Bo enables you to remove the second FOR loop:
x_smf = np.sum(sm_factor * np.square(d2m @ A.T).sum(axis=0))
However, I want to attract your attention on the fact that the x_smf matrix is sparse and storing it as a dense ndarray is bad for both computation time and memory storage. Instead of casting to dense ndarray, I advise you to cast to a sparse matrix format. For example lil_matrix, which is a list of lists sparse matrix format, using tolil() method instead of toarray():
# Sparse tridiagonal matrix
d2m_s = sps.diags([1, -2, 1], [-1, 0, 1], shape=(num, num)).tolil() # cast to lil
Here is a script that compares the three implementations on a bigger case num=4000 (for num=5 all give 324). For this size, I am already seeing benefits of using sparse matrix, here is the whole script (the first lines are a generalisation of the code for num different from 5):
from time import time
import numpy as np
import scipy.sparse as sps
num = 4000
A = np.concatenate([np.arange(1, (num-2)*num+1).reshape(num, num-2), np.repeat([[4, 5]], num, axis=0)], axis=1)
sm_factor = 0.1*np.ones(num)
########## DENSE matrix + FOR loop ##########
d2m = sps.diags([1, -2, 1], [-1, 0, 1], shape=(num, num)).toarray() # cast to array
# First line boundary conditions
d2m[0, 0] = 2
d2m[0, 1] = -5
d2m[0, 2] = 4
d2m[0, 3] = -1
# Last line boundary conditions
d2m[num-1, num-4] = -1
d2m[num-1, num-3] = 4
d2m[num-1, num-2] = -5
d2m[num-1, num-1] = 2
# FOR loop version
t_start = time()
x_smf = 0
for i in range(len(sm_factor)):
x_smf = x_smf + sm_factor[i] * (d2m @ (A[i, :]).T).T @ (d2m @ (A[i, :]).T)
print(f'FOR loop version time: {time()-t_start}s')
print(f'FOR loop version value: {x_smf}\n')
########## DENSE matrix + VECTORIZED ##########
t_start = time()
x_smf_v = np.sum(sm_factor * np.square(d2m @ A.T).sum(axis=0))
print(f'VECTORIZED version time: {time()-t_start}s')
print(f'VECTORIZED version value: {x_smf_v}\n')
########## SPARSE matrix + VECTORIZED ##########
d2m_s = sps.diags([1, -2, 1], [-1, 0, 1], shape=(num, num)).tolil() # cast to lil
# First line boundary conditions
d2m_s[0, 0] = 2
d2m_s[0, 1] = -5
d2m_s[0, 2] = 4
d2m_s[0, 3] = -1
# Last line boundary conditions
d2m_s[num-1, num-4] = -1
d2m_s[num-1, num-3] = 4
d2m_s[num-1, num-2] = -5
d2m_s[num-1, num-1] = 2
t_start = time()
x_smf_s = np.sum(sm_factor * np.square(d2m_s @ A.T).sum(axis=0))
print(f'SPARSE+VECTORIZED version time: {time()-t_start}s')
print(f'SPARSE+VECTORIZED version value: {x_smf_s}\n')
Here is what I get when running the code:
FOR loop version time: 25.878241777420044s
FOR loop version value: 3.752317536763356e+17
VECTORIZED version time: 1.0873610973358154s
VECTORIZED version value: 3.752317536763356e+17
SPARSE+VECTORIZED version time: 0.37279224395751953s
SPARSE+VECTORIZED version value: 3.752317536763356e+17
As you can see the use of a sparse matrix makes you win another factor 3 on computation time and doesn't require you to adapt the code coming afterwards. It is also a good strategy to test the various scipy implementations of sparse matrices (tocsc(), tocsr(), todok() etc.), some may be more adapted to your case.
d2m[0, 0:4] = [2,-5,4,-1](((d2m @ A.T) * (d2m @ A.T)).sum(0) * sm_factor).sum(), if you have a different sm_factor for every iteration. Otherwise you can replace sm_factor with.1.a = d2m @ A.Tto avoid redundant computations:((a * a).sum(0) * .1).sum()