Create a 2D array from another array and its indices with NumPy

Question

Given an array:

arr = np.array([[1, 3, 7], [4, 9, 8]]); arr

array([[1, 3, 7],
       [4, 9, 8]])

And given its indices:

np.indices(arr.shape)

array([[[0, 0, 0],
        [1, 1, 1]],

       [[0, 1, 2],
        [0, 1, 2]]])

How would I be able to stack them neatly one against the other to form a new 2D array? This is what I'd like:

array([[0, 0, 1],
       [0, 1, 3],
       [0, 2, 7],
       [1, 0, 4],
       [1, 1, 9],
       [1, 2, 8]])

This is my current solution:

def foo(arr):
    return np.hstack((np.indices(arr.shape).reshape(2, arr.size).T, arr.reshape(-1, 1)))

It works, but is there something shorter/more elegant to carry this operation out?

What happens if the array is a different data type to np.intp? What type should the output be? — Eric
– Eric, Commented Aug 25, 2017 at 12:24
@Eric Ah, I see what you mean. If the array is a float, I think it is okay to cast the indices to float. — cs95
– cs95, Commented Aug 25, 2017 at 12:25

cs95 · Accepted Answer · 2017-08-25 13:19:49Z

Using array-initialization and then broadcasted-assignment for assigning indices and the array values in subsequent steps -

def indices_merged_arr(arr):
    m,n = arr.shape
    I,J = np.ogrid[:m,:n]
    out = np.empty((m,n,3), dtype=arr.dtype)
    out[...,0] = I
    out[...,1] = J
    out[...,2] = arr
    out.shape = (-1,3)
    return out

Note that we are avoiding the use of np.indices(arr.shape), which could have slowed things down.

Sample run -

In [10]: arr = np.array([[1, 3, 7], [4, 9, 8]])

In [11]: indices_merged_arr(arr)
Out[11]: 
array([[0, 0, 1],
       [0, 1, 3],
       [0, 2, 7],
       [1, 0, 4],
       [1, 1, 9],
       [1, 2, 8]])

Performance

arr = np.random.randn(100000, 2)

%timeit df = pd.DataFrame(np.hstack((np.indices(arr.shape).reshape(2, arr.size).T,\
                                arr.reshape(-1, 1))), columns=['x', 'y', 'value'])
100 loops, best of 3: 4.97 ms per loop

%timeit pd.DataFrame(indices_merged_arr_divakar(arr), columns=['x', 'y', 'value'])
100 loops, best of 3: 3.82 ms per loop

%timeit pd.DataFrame(indices_merged_arr_eric(arr), columns=['x', 'y', 'value'], dtype=np.float32)
100 loops, best of 3: 5.59 ms per loop

Note: Timings include conversion to pandas dataframe, that is the eventual use case for this solution.

Okay, this looks simple. Would you consider adding some timings for larger 2D arrays, just for completeness?
@cᴏʟᴅsᴘᴇᴇᴅ Do you have a loopy solution that I could compare against?
I've edited the solution I have in my question as a function, if that helps. This is the only solution I have.
@cᴏʟᴅsᴘᴇᴇᴅ Doesn't seem like any better. I guess a better one could use this one.

Eric · Accepted Answer · 2017-08-25 12:49:41Z

3

A more generic answer for nd arrays, that handles other dtypes correctly:

def indices_merged_arr(arr):
    out = np.empty(arr.shape, dtype=[
        ('index', np.intp, arr.ndim),
        ('value', arr.dtype)
    ])
    out['value'] = arr
    for i, l in enumerate(arr.shape):
        shape = (1,)*i + (-1,) + (1,)*(arr.ndim-1-i)
        out['index'][..., i] = np.arange(l).reshape(shape)
    return out.ravel()

This returns a structured array with an index column and a value column, which can be of different types.

edited Aug 25, 2017 at 12:49

answered Aug 25, 2017 at 12:31

Eric

98.1k54 gold badges257 silver badges389 bronze badges

Collectives™ on Stack Overflow

Create a 2D array from another array and its indices with NumPy

2 Answers 2

5 Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

5 Comments

Comments

Linked

Related