Optimizing Matrix Calculation in Python

Question

I'm trying to make an Abelian Sandpile using numpy arrays in python. The calculation speed is okay for smaller square matrices, but for larger ones, it slows down significantly (200x200 matrix, with 20000 initial sand particles taking upto 20-30 minutes). Is there any way to speed it up / optimize the matrix calculation? The threshold value is 3.

The basic code right now is -

import numpy as np
n = 200
size = (n,n)
x = np.zeros(size)
m = 0 # mean
if n%2 == 0:
    m = int((n+1)/2)
else :
    m = int(n/2)

x[m][m] = 100000
z = int(x[m][m])

def f(x):
    count = 0
    for i in range(0,n):
        for j in range(0,n):
            if x[i][j] > 3:
                x[i][j] = x[i][j] - 4
                if i-1 >= 0 :
                    x[i-1][j] = x[i-1][j] + 1 
                if i+1 < n :
                    x[i+1][j] = x[i+1][j] + 1
                if j-1 >= 0 :
                    x[i][j-1] = x[i][j-1] + 1 
                if j+1 < n :    
                    x[i][j+1] = x[i][j+1] + 1
            elif x[i][j] <= 3:
                count = count + 1
    return x, count
for k in range(0,z):
    y, count = f(x)
    if count == n**2 :
        break
    elif count < n**2:
        continue
print(y)

I've tried running a 500x500 matrix, with 100,000 initial particles, but that took more than 6 hours.

@Mad Physicist I did use numpy (I edited it to show it now). Perhaps you meant something else? — Tjis
– Tjis, Commented Apr 26, 2018 at 20:57
You're using numpy for storage but not to vectorize your code. — Mad Physicist
– Mad Physicist, Commented Apr 26, 2018 at 22:03
What does "The threshold value is 3" mean? Isn't the critical value always a multiple of 4? Also, I am not sure you implemented this correctly. In your case a pile tumbling to the right and down will affect the possibility of the neighboring piles exceeding their threshold in the current iteration, and I'm pretty sure that isn't right. — Mad Physicist
– Mad Physicist, Commented Apr 26, 2018 at 22:14

Moritz · Accepted Answer · 2018-04-26 21:09:36Z

You can use numba for this purpose (you can add nopython=True or use static types for more speedup):

from numba import jit
import numpy as np

n = 200
size = (n,n)
x = np.zeros(size)
m = 0 # mean
if n%2 == 0:
    m = int((n+1)/2)
else :
    m = int(n/2)

x[m][m] = 100000
z = int(x[m][m])

def f(x):
    count = 0
    for i in range(0,n):
        for j in range(0,n):
            if x[i][j] > 3:
                x[i][j] = x[i][j] - 4
                if i-1 >= 0 :
                    x[i-1][j] = x[i-1][j] + 1 
                if i+1 < n :
                    x[i+1][j] = x[i+1][j] + 1
                if j-1 >= 0 :
                    x[i][j-1] = x[i][j-1] + 1 
                if j+1 < n :    
                    x[i][j+1] = x[i][j+1] + 1
            elif x[i][j] <= 3:
                count = count + 1
    return x, count

@jit
def f_jit(x):
    count = 0
    for i in range(0,n):
        for j in range(0,n):
            if x[i][j] > 3:
                x[i][j] = x[i][j] - 4
                if i-1 >= 0 :
                    x[i-1][j] = x[i-1][j] + 1 
                if i+1 < n :
                    x[i+1][j] = x[i+1][j] + 1
                if j-1 >= 0 :
                    x[i][j-1] = x[i][j-1] + 1 
                if j+1 < n :    
                    x[i][j+1] = x[i][j+1] + 1
            elif x[i][j] <= 3:
                count = count + 1
    return x, count

%%timeit
f(x)
28.7 ms ± 602 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

%%timeit
f_jit(x)
59.9 µs ± 7.22 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Mad Physicist · Accepted Answer · 2018-04-27 02:40:08Z

While vectorising with numpy will not necessarily cut down your algorithmic complexity, it will probably cut down your overhead by a factor of at least a few dozen. As a general rule of thumb, if you find yourself writing explicit loops or using explicit if statements, you should consider rethinking your approach.

What may help you here is simple masking to implement the toppling. If you have the toppling sites marked with a 1 in a mask of the same shape as x, you can directly subtract off the toppled piles and add the distributed sand just by shifting the mask:

mask = (x >= 4)
x[mask] -= 4
x[:, :-1] += mask[:, 1:] # topple left
x[:, 1:] += mask[:, :-1] # topple right
x[:-1, :] += mask[1:, :] # topple up
x[1:, :] += mask[:-1, :] # topple down

If count is just the number of untopppled sites, you can use np.count_nonzero to obtain it from the mask:

count = np.count_nonzero(mask)

If, on the other hand, you are using count to determine when to stop your loop, you might find it easier to switch to counting how many topple sites there are:

count = np.sum(mask)

The outer loop terminates when this version of count reaches zero (or the original version reaches x.size).

Collectives™ on Stack Overflow

Optimizing Matrix Calculation in Python

2 Answers 2

Comments

Comments

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Related