I have an array in the following form where the first two columns are supposed to be indices of a 2-dimensional array and the following columns are arbitrary values.
data = np.array([[ 0. ,  1. , 48. ,  4. ],
                 [ 1. ,  2. , 44. ,  4.4],
                 [ 1. ,  1. , 34. ,  2.3],
                 [ 0. ,  2. , 55. ,  2.2],
                 [ 0. ,  0. , 42. ,  2. ],
                 [ 1. ,  0. , 22. ,  1. ]])
How do I combine the indices data[:,:2] with their values data[:,2:] such that the resulting array is accessible by the indices in the first two columns.
In my example that would be:
result = np.array([[[42. ,  2. ], [48. ,  4. ], [55. ,  2.2]],
                   [[22. ,  1. ], [34. ,  2.3], [44. ,  4.4]]])
I know that there is a trivial solution using python loops. But performance is a concern since I'm dealing with a huge amount of data. Specifically it's output of another program that I need to process.
Maybe there is a relatively trivial numpy solution as well. But I'm kind of stuck.
If it helps the following can be safely assumed:
- All numbers in the first two columns are whole numbers (although the array consists of floats).
- Every possible index (or rather combinations of indices) in the original array is used exactly once. I.e. there is guaranteed to be exactly one entry of the form [i, j, ...].
- The indices start at 0 and I know the highest indices beforehand.
Edit:
Hmm. I see now how my example is misleading. The truth is that some of my input arrays are sorted, but that's unreliable. So I shouldn't assume anything about the order. I reordered some rows in my example to make it clearer. In case anyone wants to make sense of the answer and comment below: In my original question the array appeared to be sorted by the first two columns.


data[:,2:].reshape(-1,int(data[:,1].max() + 1),2), otherwise please include a more realistic example.data[np.lexsort(data[:,[1,0]].T)].numba.njitan iterative solution would work without sorting data if performance is the priority.