If you're prepared to get your hands dirty, you can speed it up quite significantly using Cython
Original function, for reference:
import numpy as np
def original_indices(start, stop):
index_list = []
for i in range(len(start)):
temp = range(start[i], stop[i])
index_list.extend(temp)
return np.array(index_list)
Cythonized version:
#!python
# cython: boundscheck=False
# cython: wraparound=False
import numpy as np
cimport numpy as np
def cython_indices(Py_ssize_t[:] start, Py_ssize_t[:] stop):
cdef:
Py_ssize_t final_size, count, ii
Py_ssize_t[:] index_array
final_size = 0
for ii in range(start.shape[0]):
final_size += stop[ii] - start[ii]
index_array = np.empty(final_size, dtype=np.int64)
count = 0
for ii in range(start.shape[0]):
idx = start[ii]
while idx < stop[ii]:
index_array[count] = idx
idx += 1
count += 1
return index_array
Some fake data:
start = np.random.random_integers(0, 1000, size=100000)
stop = start + np.random.random_integers(0, 10, size=100000)
Some timings:
%timeit original_indices(start, stop)
# 10 loops, best of 3: 79.4 ms per loop
%timeit cython_indices(start, stop)
# 1000 loops, best of 3: 1.35 ms per loop
Cython speeds things up by an order of magnitude compared with the original version.
np.concatenate([np.arange(frm, to) for frm, to in zip(start, stop)])but vectorized?index_list += t, but it's currently quite fast (4us on my machine for smallstart/stop). How long isstart?range(start[i], stop[i])vary across rows, or is it constant?startandstopwill have up to ~100k elements, andrange(start[i], stop[i])will also vary from having a length of 0 to about 10 I believe.