5

Given an array:

arr = np.array([[1, 3, 7], [4, 9, 8]]); arr

array([[1, 3, 7],
       [4, 9, 8]])

And given its indices:

np.indices(arr.shape)

array([[[0, 0, 0],
        [1, 1, 1]],

       [[0, 1, 2],
        [0, 1, 2]]])

How would I be able to stack them neatly one against the other to form a new 2D array? This is what I'd like:

array([[0, 0, 1],
       [0, 1, 3],
       [0, 2, 7],
       [1, 0, 4],
       [1, 1, 9],
       [1, 2, 8]])

This is my current solution:

def foo(arr):
    return np.hstack((np.indices(arr.shape).reshape(2, arr.size).T, arr.reshape(-1, 1)))

It works, but is there something shorter/more elegant to carry this operation out?

2
  • What happens if the array is a different data type to np.intp? What type should the output be? Commented Aug 25, 2017 at 12:24
  • @Eric Ah, I see what you mean. If the array is a float, I think it is okay to cast the indices to float. Commented Aug 25, 2017 at 12:25

2 Answers 2

4

Using array-initialization and then broadcasted-assignment for assigning indices and the array values in subsequent steps -

def indices_merged_arr(arr):
    m,n = arr.shape
    I,J = np.ogrid[:m,:n]
    out = np.empty((m,n,3), dtype=arr.dtype)
    out[...,0] = I
    out[...,1] = J
    out[...,2] = arr
    out.shape = (-1,3)
    return out

Note that we are avoiding the use of np.indices(arr.shape), which could have slowed things down.

Sample run -

In [10]: arr = np.array([[1, 3, 7], [4, 9, 8]])

In [11]: indices_merged_arr(arr)
Out[11]: 
array([[0, 0, 1],
       [0, 1, 3],
       [0, 2, 7],
       [1, 0, 4],
       [1, 1, 9],
       [1, 2, 8]])

Performance

arr = np.random.randn(100000, 2)

%timeit df = pd.DataFrame(np.hstack((np.indices(arr.shape).reshape(2, arr.size).T,\
                                arr.reshape(-1, 1))), columns=['x', 'y', 'value'])
100 loops, best of 3: 4.97 ms per loop

%timeit pd.DataFrame(indices_merged_arr_divakar(arr), columns=['x', 'y', 'value'])
100 loops, best of 3: 3.82 ms per loop

%timeit pd.DataFrame(indices_merged_arr_eric(arr), columns=['x', 'y', 'value'], dtype=np.float32)
100 loops, best of 3: 5.59 ms per loop

Note: Timings include conversion to pandas dataframe, that is the eventual use case for this solution.

Sign up to request clarification or add additional context in comments.

5 Comments

Okay, this looks simple. Would you consider adding some timings for larger 2D arrays, just for completeness?
@cᴏʟᴅsᴘᴇᴇᴅ Do you have a loopy solution that I could compare against?
I've edited the solution I have in my question as a function, if that helps. This is the only solution I have.
@cᴏʟᴅsᴘᴇᴇᴅ Doesn't seem like any better. I guess a better one could use this one.
Added some perf stats. Your solution is great!
3

A more generic answer for nd arrays, that handles other dtypes correctly:

def indices_merged_arr(arr):
    out = np.empty(arr.shape, dtype=[
        ('index', np.intp, arr.ndim),
        ('value', arr.dtype)
    ])
    out['value'] = arr
    for i, l in enumerate(arr.shape):
        shape = (1,)*i + (-1,) + (1,)*(arr.ndim-1-i)
        out['index'][..., i] = np.arange(l).reshape(shape)
    return out.ravel()

This returns a structured array with an index column and a value column, which can be of different types.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.