Combine array of indices with array of values

Question

I have an array in the following form where the first two columns are supposed to be indices of a 2-dimensional array and the following columns are arbitrary values.

data = np.array([[ 0. ,  1. , 48. ,  4. ],
                 [ 1. ,  2. , 44. ,  4.4],
                 [ 1. ,  1. , 34. ,  2.3],
                 [ 0. ,  2. , 55. ,  2.2],
                 [ 0. ,  0. , 42. ,  2. ],
                 [ 1. ,  0. , 22. ,  1. ]])

How do I combine the indices data[:,:2] with their values data[:,2:] such that the resulting array is accessible by the indices in the first two columns.

In my example that would be:

result = np.array([[[42. ,  2. ], [48. ,  4. ], [55. ,  2.2]],
                   [[22. ,  1. ], [34. ,  2.3], [44. ,  4.4]]])

I know that there is a trivial solution using python loops. But performance is a concern since I'm dealing with a huge amount of data. Specifically it's output of another program that I need to process.

Maybe there is a relatively trivial numpy solution as well. But I'm kind of stuck.

If it helps the following can be safely assumed:

All numbers in the first two columns are whole numbers (although the array consists of floats).
Every possible index (or rather combinations of indices) in the original array is used exactly once. I.e. there is guaranteed to be exactly one entry of the form [i, j, ...].
The indices start at 0 and I know the highest indices beforehand.

Edit:

Hmm. I see now how my example is misleading. The truth is that some of my input arrays are sorted, but that's unreliable. So I shouldn't assume anything about the order. I reordered some rows in my example to make it clearer. In case anyone wants to make sense of the answer and comment below: In my original question the array appeared to be sorted by the first two columns.

If data is sorted by indices and result is regular data[:,2:].reshape(-1,int(data[:,1].max() + 1),2), otherwise please include a more realistic example. — Michael Szczesny
– Michael Szczesny, Commented Jul 2, 2022 at 14:35
@MichaelSzczesny Thanks for the hint. I clarified my question. — Scindix
– Scindix, Commented Jul 2, 2022 at 15:55
For the new requirements sort before reshape data[np.lexsort(data[:,[1,0]].T)]. — Michael Szczesny
– Michael Szczesny, Commented Jul 2, 2022 at 15:55
numba.njit an iterative solution would work without sorting data if performance is the priority. — Michael Szczesny
– Michael Szczesny, Commented Jul 2, 2022 at 15:59

I'mahdi · Accepted Answer · 2022-07-02 14:38:58Z

1

find row, column, depth base your data array, then fill like below:

import numpy as np
data = np.array([[ 0. ,  0. , 42. ,  2. ],
                 [ 0. ,  1. , 48. ,  4. ],
                 [ 0. ,  2. , 55. ,  2.2],
                 [ 1. ,  0. , 22. ,  1. ],
                 [ 1. ,  1. , 34. ,  2.3],
                 [ 1. ,  2. , 44. ,  4.4]])

row = int(max(data[:,0]))+1
col = int(max(data[:,1]))+1
depth = len(data[0, 2:])

out = np.zeros([row, col, depth])

out = data[:, 2:].reshape(row,col,depth)
print(out)

Output:

[[[42.   2. ]
  [48.   4. ]
  [55.   2.2]]

 [[22.   1. ]
  [34.   2.3]
  [44.   4.4]]]

answered Jul 2, 2022 at 14:38

I'mahdi

24.1k5 gold badges25 silver badges32 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Scindix Over a year ago

My input array isn't necessarily sorted (see my edit). I could of course sort the array first like so: stackoverflow.com/a/46230001/3139807 I thought there might be a more efficient way that doesn't require sorting. But I guess it still beats looping over the array by a lot. So I'm sticking with this solution.

Ali_Sh · Accepted Answer · 2022-07-02 20:08:02Z

You can use numba in no-python parallel mode with loops (which is inherently for python loops acceleration) that will be one of the most efficient methods in terms of performance as szczesny mentioned in the comments, that won't need to sort; this code is adjusted for when column counts are 2, if it be changeable, this code can be modified to handle that:

# without signature --> @nb.njit(parallel=True)
@nb.njit("float64[:, :, ::1](float64[:, ::1])", parallel=True)
def numba_(data):
    data_ = data[:, :2].astype(np.int8)
    res = np.empty((data_[:, 0].max() + 1, data_[:, 1].max() + 1, 2))
    for i in nb.prange(data_.shape[0]):
        res[data_[i, 0], data_[i, 1], 0] = data[i, 2]
        res[data_[i, 0], data_[i, 1], 1] = data[i, 3]
    return res

without the sorting and curing the proposed NumPy code (horizontal axis --> data.shape[0]):

More general to consider more than 2 columns:

@nb.njit("float64[:, :, ::1](float64[:, ::1])", parallel=True)
def numba_(data):
    data_ = data[:, :2].astype(np.int8)
    assert data_.shape[0] == data.shape[0]
    depth = data[:, 2:].shape[1]
    res = np.empty((data_[:, 0].max() + 1, data_[:, 1].max() + 1, depth))
    for i in nb.prange(data_.shape[0]):
        for j in range(depth):
            res[data_[i, 0], data_[i, 1], j] = data[i, j + 2]
    return res

Collectives™ on Stack Overflow

Combine array of indices with array of values

Edit:

2 Answers 2

1 Comment

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

Edit:

2 Answers 2

1 Comment

Comments

Linked

Related