2

Currently, I'm loading in some data into memory of the form:

5.579158e-19    0   0
5.678307e-19    1   0
...
6.041513e-19    27  0
5.938317e-19    28  0
...
5.978803e-19    38  1
5.590008e-19    39  1 
5.588807e-19    0   2
5.670948e-19    1   2
...

and so on with the command:

import numpy as np
data_res = np.genfromtxt('/path/data.csv',delimiter=';', dtype = float)

What I want, is a 40x40 matrix mat, where the indices are the entries in the second and third columns. The first entry mat[0,0] = data[0,0] is easy, but the problem is that the list is not sorted and that the entries in the second an third columns are floats so I can't reference them in the slice.

I've tried a double for loop method but it does not work properly.

mat = np.zeros((40,40))

for k in range(0,40):
    for j in range(0,40):
        mat[k,j] = data_res[k*j,0]

Wouldn't this method work if the index ran from 1-40 and not 0-39?

Thanks.

0

5 Answers 5

4

This can be done with no explicit loops. I'll use a smaller data set, and create a 10x10 array mat. If an index (i,j) is not in the CSV file, mat[i,j] will be 0.

Here's the input file:

In [27]: !cat data.csv
0.1    0   0
0.2    1   0
0.3    7   0
0.4    8   0
0.5    8   1
0.6    9   1 
0.7    0   2
0.8    1   2
0.9    9   9

Use genfromtxt to read the data into a structured array with three fields, values, i and j.

In [28]: data = np.genfromtxt('data.csv', dtype=None, names=['values', 'i', 'j'])

By using dtype=None, we're telling genfromtxt to determine the data type based on what is found in the file. In this case, the 'values' field will be floating point, and the fields 'i' and 'j' will be integer.

Create the array mat:

In [29]: mat = np.zeros((10, 10))

Assign the data to mat:

In [30]: mat[data['i'], data['j']] = data['values']

In [31]: mat
Out[31]: 
array([[ 0.1,  0. ,  0.7,  0. ,  0. ,  0. ,  0. ,  0. ,  0. ,  0. ],
       [ 0.2,  0. ,  0.8,  0. ,  0. ,  0. ,  0. ,  0. ,  0. ,  0. ],
       [ 0. ,  0. ,  0. ,  0. ,  0. ,  0. ,  0. ,  0. ,  0. ,  0. ],
       [ 0. ,  0. ,  0. ,  0. ,  0. ,  0. ,  0. ,  0. ,  0. ,  0. ],
       [ 0. ,  0. ,  0. ,  0. ,  0. ,  0. ,  0. ,  0. ,  0. ,  0. ],
       [ 0. ,  0. ,  0. ,  0. ,  0. ,  0. ,  0. ,  0. ,  0. ,  0. ],
       [ 0. ,  0. ,  0. ,  0. ,  0. ,  0. ,  0. ,  0. ,  0. ,  0. ],
       [ 0.3,  0. ,  0. ,  0. ,  0. ,  0. ,  0. ,  0. ,  0. ,  0. ],
       [ 0.4,  0.5,  0. ,  0. ,  0. ,  0. ,  0. ,  0. ,  0. ,  0. ],
       [ 0. ,  0.6,  0. ,  0. ,  0. ,  0. ,  0. ,  0. ,  0. ,  0.9]])
Sign up to request clarification or add additional context in comments.

Comments

2

If I understood your question then I guess you want to sort your array based on indices. For that you can use numpy.lexsort:

>>> arr = np.arange(16).reshape(4, 4).astype(float)
>>> x, y = arr.shape
>>> indices = np.vstack(np.unravel_index(np.arange(x*y), (y, x))).T
>>> np.random.shuffle(indices)
>>> arr = np.hstack((arr.flatten()[:, None], indices))
>>> arr  # now this looks like your dataset, first column is data and other two are indices
array([[  0.,   1.,   3.],
       [  1.,   1.,   2.],
       [  2.,   3.,   0.],
       [  3.,   0.,   1.],
       [  4.,   0.,   0.],
       [  5.,   2.,   0.],
       [  6.,   0.,   2.],
       [  7.,   2.,   3.],
       [  8.,   3.,   2.],
       [  9.,   0.,   3.],
       [ 10.,   3.,   1.],
       [ 11.,   1.,   0.],
       [ 12.,   3.,   3.],
       [ 13.,   1.,   1.],
       [ 14.,   2.,   2.],
       [ 15.,   2.,   1.]])
>>> arr[np.lexsort((arr[:, 2], arr[:,1]))][:,0].reshape(4, 4)
array([[  4.,   3.,   6.,   9.],
       [ 11.,  13.,   1.,   0.],
       [  5.,  15.,  14.,   7.],
       [  2.,  10.,   8.,  12.]])

Comments

1

Your loop isn't working because of your data_res[k*j,0] isn't doing what I think you want it to do.

To get the desired result try data_res[(k*40)+j,0].

dim = 40
mat = np.zeros((dim,dim))

for k in range(0,dim):
    for j in range(0,dim):
        mat[k,j] = data_res[(k*dim)+j,0]

This is based on the assumption that your indicies are in fact already are sorted. As ajcr points out, if they aren't you'll need a different approach.

UPDATE: The second example provided by hooked is a much cleaner way to do this, and a more robust solution.

Comments

1

Since your matrix is so small (40x40) a pure python solution for reading the file and imputing into a numpy array might be better for you:

raw = '''5.579158e-19    0   0
5.678307e-19    1   0
6.041513e-19    27  0
5.588807e-19    0   2
5.670948e-19    1   2'''

import numpy as np
mat = np.zeros((40,40))

for line in raw.split('\n'):
    z,i,j = line.split()
    mat[int(i),int(j)]=float(z)

print mat

The example above uses a string to hold the data for an example of a file. If the file was called data.txt you would run instead:

with open("data.txt") as FIN:
    for line in FIN:
        z,i,j = line.split()
        mat[int(i),int(j)]=float(z)

Comments

-1

Try this :

mat = np.zeros((40,40))

for i in range(0,len(data_res)):
    mat[data_res[1] , data_res[2]] = data_res[0]

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.