3

I need to construct an N x M x N array A such that A[i, j, k] == (0 if i != k else x[j]). I could write:

A = np.zeros((N, M, N))
for i in range(N):
    for j in range(M):
        A[i,j,i] = x[j]

Or, alternatively:

A = np.zeros((N, M, N))
for i in range(N):
    A[i,:,i] = x

But both are most likely too slow for my purposes. Is there a faster way?

3
  • That depends a lot on the values of N and M and the dtype of A and x. Could you provide more details? Commented Jun 18, 2017 at 22:04
  • @MSeifert N and M are independent of each other. They range between 1 and 100 typically, but there's no real limit. Even though N and M are usually pretty small, this computation happens very frequently, and rarely they might be as large as 100K. A and x are both np.float32. Commented Jun 18, 2017 at 22:09
  • The problem is that array slice assignment is actually pretty fast (at least if you insert several values at once). So as long as M >> 1 you probably can't beat the second approach in the general case. If you had some limits on (or relationship between) N and M one could try to optimize it. Commented Jun 18, 2017 at 22:24

1 Answer 1

3

Approach #1

Using broadcasting to create all those linear indices where x's are to be assigned and then simply assign into its flattened view, like so -

# Initialize
Aout = np.zeros((N, M, N))

# Comput all linear indices
idx = (np.arange(N)*(N*M+1))[:,None] + N*np.arange(M)

# In a flattened view with `.ravel()` assign from x
Aout.ravel()[idx] = x

Approach #2

Using views based element access supported by np.lib.stride_tricks.as_strided -

Aout = np.zeros((N, M, N))
s0,s1,s2 = Aout.strides
Aoutview = np.lib.stride_tricks.as_strided(Aout,shape=(N,M),strides=(s0+s2,s1))
Aoutview[:] = x

Approach #3

Another approach would be to use integer array indexing along the first and third axes, closely simulating the second approach from the question, but in a vectorized manner -

Aout = np.zeros((N, M, N))
Aout[np.arange(N),:,np.arange(N)] = x

Runtime test

Approaches -

def app0(x,A):
    for i in range(N):
        for j in range(M):
            A[i,j,i] = x[j]
    return A

def app1(x,A):
    for i in range(N):
        A[i,:,i] = x
    return A

def app2(x,Aout):       
    idx = (np.arange(N)*(N*M+1))[:,None] + N*np.arange(M)
    Aout.ravel()[idx] = x
    return Aout

def app3(x,Aout):       
    s0,s1,s2 = Aout.strides
    Aoutview = np.lib.index_tricks.as_strided(Aout,shape=(N,M),strides=(s0+s2,s1))
    Aoutview[:] = x
    return Aout

def app4(x,Aout):
    r = np.arange(N)
    Aout[r,:,r] = x
    return Aout

Verification -

In [125]: # Params
     ...: N, M = 100,100
     ...: x = np.random.rand(M)
     ...: 
     ...: # Make copies of arrays to be assigned into
     ...: A0 = np.zeros((N, M, N))
     ...: A1 = np.zeros((N, M, N))
     ...: A2 = np.zeros((N, M, N))
     ...: A3 = np.zeros((N, M, N))
     ...: A4 = np.zeros((N, M, N))
     ...: 

In [126]: print np.allclose(app0(x,A0), app1(x,A1))
     ...: print np.allclose(app0(x,A0), app2(x,A2))
     ...: print np.allclose(app0(x,A0), app3(x,A3))
     ...: print np.allclose(app0(x,A0), app4(x,A4))
     ...: 
True
True
True
True

Timings -

In [127]: # Make copies of arrays to be assigned into
     ...: A0 = np.zeros((N, M, N))
     ...: A1 = np.zeros((N, M, N))
     ...: A2 = np.zeros((N, M, N))
     ...: A3 = np.zeros((N, M, N))
     ...: A4 = np.zeros((N, M, N))


In [128]: %timeit app0(x,A0)
     ...: %timeit app1(x,A1)
     ...: %timeit app2(x,A2)
     ...: %timeit app3(x,A3)
     ...: %timeit app4(x,A4)
     ...: 
1000 loops, best of 3: 1.49 ms per loop
10000 loops, best of 3: 53.6 µs per loop
10000 loops, best of 3: 150 µs per loop
10000 loops, best of 3: 28.6 µs per loop
10000 loops, best of 3: 25.2 µs per loop
Sign up to request clarification or add additional context in comments.

6 Comments

Is that really faster than the loop over range(N) and assigning to A[i, :, i]? On my computer it's just "slightly" faster (1-3%)
@MSeifert Depends on the loop iteration, I would think. What N did you use?
@MSeifert Not sure about your claims of 1-3%. I am getting much better timings on N,M = 100, 100 datasets as OP mentioned to be using.
Thanks for the runtime tests but I think you should add one using the second approach (that's the one I would use - if I would have that specific use-case).
I included the time it took to create the array (np.zeros). Without that it's app2 > app1 (second approach) > app3 (app2 being slowest). But the differences were quite small (~20%) even without np.zeros.
|

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.