Below are two different alternatives. The first uses ndarray.sum and NumPy integer array indexing. The second alternative uses np.einsum.
def using_sum(T):
total = T.sum(axis=1).sum(axis=0)
m = np.arange(T.shape[0])
trace = T[m, m].sum(axis=0)
return total - trace
def using_einsum(T):
return np.einsum('mnijk->ijk', T) - np.einsum('nnijk->ijk', T)
The first argument of np.einsum specifies the subscripts of the summation.
'mnijk->ijk' indicates that T has subcripts mnijk and only the ijk subscripts remain after summation. Therefore the summation is performed over the m and n subscripts. This makes
np.einsum('mnijk->ijk', T)[i,j,k] equal to np.sum(T[:,:,i,j,k]), but computes the entire array in one vectorized computation.
Similarly, 'nnijk->ijk' tells np.einsum that T has subscripts nnijk and again only the ijk subscripts survive summation. Therefore the summation is over n. Since n is repeated, the summation over n computes the trace.
I like np.einsum because it communicates the intent of the computation
succinctly. But it's also good to be familiar with how using_sum works since
it uses fundamental NumPy operations. It's a good example of how nested loops
can be avoided by using NumPy methods which operate on whole arrays.
Here's a perfplot comparing the performance of orig versus using_sum and using_einsum as a function of n, where T is taken to have shape (10, 10, n, n, n):
import perfplot
import numpy as np
def orig(T):
_, _, nx, ny, nz = T.shape
r = np.zeros((nx, ny, nz))
for i in range(nx):
for j in range(ny):
for k in range(nz):
r[i,j,k] = np.sum(T[:,:,i,j,k])-np.trace(T[:,:,i,j,k])
return r
def using_einsum(T):
r = np.einsum('mnijk->ijk', T) - np.einsum('nnijk->ijk', T)
return r
def using_sum(T):
total = T.sum(axis=1).sum(axis=0)
m = np.arange(T.shape[0])
trace = T[m,m].sum(axis=0)
return total - trace
def make_T(n):
return np.random.random((10,10,n,n,n))
perfplot.show(
setup=make_T,
kernels=[orig, using_sum, using_einsum],
n_range=range(2, 80, 3),
xlabel='n')

perfplot.show also checks that the values returned by orig, using_sum and using_einsum are equal.