I want to generate a 2D numpy array with elements calculated from their positions. Something like the following code:
import numpy as np
def calculate_element(i, j, other_parameters):
    # do something
    return value_at_i_j
def main():
    arr = np.zeros((M, N))  # (M, N) is the shape of the array
    for i in range(M):
        for j in range(N):
            arr[i][j] = calculate_element(i, j, ...)
This code runs extremely slow since the loops in Python are just not very efficient. Is there any way to do this faster in this case?
By the way, for now I use a workaround by calculating two 2D "index matrices". Something like this:
def main():
    index_matrix_i = np.array([range(M)] * N).T
    index_matrix_j = np.array([range(N)] * M)
    '''
    index_matrix_i is like
    [[0,0,0,...],
     [1,1,1,...],
     [2,2,2,...],
     ...
    ]
    index_matrix_j is like
    [[0,1,2,...],
     [0,1,2,...],
     [0,1,2,...],
     ...
    ]
    '''
    arr = calculate_element(index_matrix_i, index_matrix_j, ...)
Edit1: The code becomes much faster after I apply the "index matrices" trick, so the main question I want to ask is that if there is a way to not use this trick, since it takes more memory. In short, I want to have a solution that is efficient in both time and space.
Edit2: Some examples I tested
# a simple 2D Gaussian
def calculate_element(i, j, i_mid, j_mid, i_sig, j_sig):
    gaus_i = np.exp(-((i - i_mid)**2) / (2 * i_sig**2))
    gaus_j = np.exp(-((j - j_mid)**2) / (2 * j_sig**2))
    return gaus_i * gaus_j
# size of M, N
M, N = 1200, 4000
# use for loops to go through every element
# this code takes ~10 seconds
def main_1():
    arr = np.zeros((M, N))  # (M, N) is the shape of the array
    for i in range(M):
        for j in range(N):
            arr[i][j] = calculate_element(i, j, 600, 2000, 300, 500)
    # print(arr)
    plt.figure(figsize=(8, 5))
    plt.imshow(arr, aspect='auto', origin='lower')
    plt.show()
# use index matrices
# this code takes <1 second
def main_2():
    index_matrix_i = np.array([range(M)] * N).T
    index_matrix_j = np.array([range(N)] * M)
    arr = calculate_element(index_matrix_i, index_matrix_j, 600, 2000, 300, 500)
    # print(arr)
    plt.figure(figsize=(8, 5))
    plt.imshow(arr, aspect='auto', origin='lower')
    plt.show()




calculate_element? Regardless of how you assign things, you're still calling this function M*N times. If you can cache intermediate results of this function you could speed up the loopsM * N * np.intp), easily avoidable withnumba. But without the body ofcalculate_elementand a size estimate of N,M it's just a guess.numpymethods work with or produce whole arrays. As such they can take up a lot of memory, even if it is in temporary buffers. It's easy to estimate memory usage - from the shape and dtype. Iterations can avoid those large temporary arrays, though it won't change the size of the final result. But you loose speed - unless you implement the loops in compilation tool likecythonornumba.