Tricky Python array sorting

Question

Currently, I'm loading in some data into memory of the form:

5.579158e-19    0   0
5.678307e-19    1   0
...
6.041513e-19    27  0
5.938317e-19    28  0
...
5.978803e-19    38  1
5.590008e-19    39  1 
5.588807e-19    0   2
5.670948e-19    1   2
...

and so on with the command:

import numpy as np
data_res = np.genfromtxt('/path/data.csv',delimiter=';', dtype = float)

What I want, is a 40x40 matrix mat, where the indices are the entries in the second and third columns. The first entry mat[0,0] = data[0,0] is easy, but the problem is that the list is not sorted and that the entries in the second an third columns are floats so I can't reference them in the slice.

I've tried a double for loop method but it does not work properly.

mat = np.zeros((40,40))

for k in range(0,40):
    for j in range(0,40):
        mat[k,j] = data_res[k*j,0]

Wouldn't this method work if the index ran from 1-40 and not 0-39?

Thanks.

Warren Weckesser · Accepted Answer · 2015-01-06 21:37:50Z

This can be done with no explicit loops. I'll use a smaller data set, and create a 10x10 array mat. If an index (i,j) is not in the CSV file, mat[i,j] will be 0.

Here's the input file:

In [27]: !cat data.csv
0.1    0   0
0.2    1   0
0.3    7   0
0.4    8   0
0.5    8   1
0.6    9   1 
0.7    0   2
0.8    1   2
0.9    9   9

Use genfromtxt to read the data into a structured array with three fields, values, i and j.

In [28]: data = np.genfromtxt('data.csv', dtype=None, names=['values', 'i', 'j'])

By using dtype=None, we're telling genfromtxt to determine the data type based on what is found in the file. In this case, the 'values' field will be floating point, and the fields 'i' and 'j' will be integer.

Create the array mat:

In [29]: mat = np.zeros((10, 10))

Assign the data to mat:

In [30]: mat[data['i'], data['j']] = data['values']

In [31]: mat
Out[31]: 
array([[ 0.1,  0. ,  0.7,  0. ,  0. ,  0. ,  0. ,  0. ,  0. ,  0. ],
       [ 0.2,  0. ,  0.8,  0. ,  0. ,  0. ,  0. ,  0. ,  0. ,  0. ],
       [ 0. ,  0. ,  0. ,  0. ,  0. ,  0. ,  0. ,  0. ,  0. ,  0. ],
       [ 0. ,  0. ,  0. ,  0. ,  0. ,  0. ,  0. ,  0. ,  0. ,  0. ],
       [ 0. ,  0. ,  0. ,  0. ,  0. ,  0. ,  0. ,  0. ,  0. ,  0. ],
       [ 0. ,  0. ,  0. ,  0. ,  0. ,  0. ,  0. ,  0. ,  0. ,  0. ],
       [ 0. ,  0. ,  0. ,  0. ,  0. ,  0. ,  0. ,  0. ,  0. ,  0. ],
       [ 0.3,  0. ,  0. ,  0. ,  0. ,  0. ,  0. ,  0. ,  0. ,  0. ],
       [ 0.4,  0.5,  0. ,  0. ,  0. ,  0. ,  0. ,  0. ,  0. ,  0. ],
       [ 0. ,  0.6,  0. ,  0. ,  0. ,  0. ,  0. ,  0. ,  0. ,  0.9]])

Ashwini Chaudhary · Accepted Answer · 2015-01-06 20:58:16Z

If I understood your question then I guess you want to sort your array based on indices. For that you can use numpy.lexsort:

>>> arr = np.arange(16).reshape(4, 4).astype(float)
>>> x, y = arr.shape
>>> indices = np.vstack(np.unravel_index(np.arange(x*y), (y, x))).T
>>> np.random.shuffle(indices)
>>> arr = np.hstack((arr.flatten()[:, None], indices))
>>> arr  # now this looks like your dataset, first column is data and other two are indices
array([[  0.,   1.,   3.],
       [  1.,   1.,   2.],
       [  2.,   3.,   0.],
       [  3.,   0.,   1.],
       [  4.,   0.,   0.],
       [  5.,   2.,   0.],
       [  6.,   0.,   2.],
       [  7.,   2.,   3.],
       [  8.,   3.,   2.],
       [  9.,   0.,   3.],
       [ 10.,   3.,   1.],
       [ 11.,   1.,   0.],
       [ 12.,   3.,   3.],
       [ 13.,   1.,   1.],
       [ 14.,   2.,   2.],
       [ 15.,   2.,   1.]])
>>> arr[np.lexsort((arr[:, 2], arr[:,1]))][:,0].reshape(4, 4)
array([[  4.,   3.,   6.,   9.],
       [ 11.,  13.,   1.,   0.],
       [  5.,  15.,  14.,   7.],
       [  2.,  10.,   8.,  12.]])

rawkintrevo · Accepted Answer · 2015-01-06 21:08:30Z

Your loop isn't working because of your data_res[k*j,0] isn't doing what I think you want it to do.

To get the desired result try data_res[(k*40)+j,0].

dim = 40
mat = np.zeros((dim,dim))

for k in range(0,dim):
    for j in range(0,dim):
        mat[k,j] = data_res[(k*dim)+j,0]

This is based on the assumption that your indicies are in fact already are sorted. As ajcr points out, if they aren't you'll need a different approach.

UPDATE: The second example provided by hooked is a much cleaner way to do this, and a more robust solution.

Hooked · Accepted Answer · 2015-01-06 21:02:35Z

Since your matrix is so small (40x40) a pure python solution for reading the file and imputing into a numpy array might be better for you:

raw = '''5.579158e-19    0   0
5.678307e-19    1   0
6.041513e-19    27  0
5.588807e-19    0   2
5.670948e-19    1   2'''

import numpy as np
mat = np.zeros((40,40))

for line in raw.split('\n'):
    z,i,j = line.split()
    mat[int(i),int(j)]=float(z)

print mat

The example above uses a string to hold the data for an example of a file. If the file was called data.txt you would run instead:

with open("data.txt") as FIN:
    for line in FIN:
        z,i,j = line.split()
        mat[int(i),int(j)]=float(z)

labzus · Accepted Answer · 2015-01-06 21:03:24Z

-1

Try this :

mat = np.zeros((40,40))

for i in range(0,len(data_res)):
    mat[data_res[1] , data_res[2]] = data_res[0]

answered Jan 6, 2015 at 21:03

labzus

1371 silver badge5 bronze badges

Collectives™ on Stack Overflow

Tricky Python array sorting

5 Answers 5

Comments

Comments

Comments

Comments

Comments

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

Comments

Comments

Comments

Comments

Comments

Related