2

I have a pair of equal length numpy arrays. dwells contains float numbers representing dwell times, and ids represents a state. In my example there are only 3 unique states labeled 0, 1, 2.

dwells = np.array([4.3,0.2,3,1.5])
ids = np.array([2, 0, 1, 2])

Previous 2 arrays model a system that starts in state 2, stays there for 4.3 seconds, jumps to state 0, stays for 0.2 seconds and so on. I would like to generate another numpy array. It needs as many columns as dwells.sum(), each representing a whole number 0,1,2,3... indicating time. Each row matches one of the unique states (in this case 3). Each element of this array represents the relative contribution of each state during that time. For example, during the first 4 time points, only state 2 has any contribution, and thus the 1st 4 elements of row 2 are equal to 1. The fifth column has contribution from all 3 states but the sum equals 1.

[[0, 0, 0, 0, 0.2, 0, 0,  0,  0]
 [0, 0, 0, 0, 0.5, 1, 1, 0.5, 0]
 [1, 1, 1, 1, 0.3, 0, 0, 0.5, 1]]

I could do that with a for loop but I wonder if there is a more efficient vectorized way.

5
  • Is 0.1 your smallest possible time step? Commented Nov 25, 2019 at 18:28
  • No, dwells can have any length, even arbitrarily small Commented Nov 25, 2019 at 18:45
  • Aw, too bad. But I will leave my answer here. Maybe it can help someone to find a more general solution. Commented Nov 25, 2019 at 18:46
  • Often it helps to show the for loop solution first. That defines a clear target. Commented Nov 25, 2019 at 20:15
  • If your code is purely numerical, you can try to run your for loop using numba. Commented Nov 26, 2019 at 7:08

1 Answer 1

2

Assuming we have a smallest possible timestep of delta:

import numpy as np

dwells = np.array([4.3,0.2,3,1.5])
ids = np.array([2, 0, 1, 2])

def dwell_map(dwells, ids, delta=0.1):
    import numpy as np
    import sys

    idelta = 1 / delta

    # ensure that idelta is an integer number
    if not idelta.is_integer():
        raise ValueError("1/delta is not integer") 

    idelta = int(idelta)

    # create new longer dwells array
    dwells_l = (dwells*idelta).astype(int)

    # create target array
    a = np.zeros((ids.max()+1, dwells_l.sum().astype(int)), dtype=int)

    # create repeats of the ids according to the dwell time
    ind = np.repeat(ids, dwells_l)

    # put ones at the position where we have the indices
    a[ind, np.arange(ind.size)] = 1

    # reduce back to the original time resolution
    a = a.reshape(ids.max()+1, -1, idelta).sum(axis=2)/idelta

    return a

res = dwell_map(dwells, ids, 0.1)

This will only work well if delta is large enough, and the total duration is small enough, so that the intermediate arrays do not grow "infinitely" large.

The performance according to the iPython %timeit magic for your example arrays, to compare it to your for-loop solution:

10000 loops, best of 5: 58.5 µs per loop
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.