I've been working on Python for around 2 months now so I have a OK understanding of it.
My goal is to create a matrix using CSV data, then populating that matrix from the data in the 3rd column of that CSV file.
I came up with this code thus far:
import csv
import csv
def readcsv(csvfile_name):
with open(csvfile_name) as csvfile:
file=csv.reader(csvfile, delimiter=",")
#remove rubbish data in first few rows
skiprows = int(input('Number of rows to skip? '))
for i in range(skiprows):
_ = next(file)
#change strings into integers/floats
for z in file:
z[:2]=map(int, z[:2])
z[2:]=map(float, z[2:])
print(z[:2])
return
After removing the rubbish data with the above code, the data in the CSV file looks like this:
Input:
1 1 51 9 3
1 2 39 4 4
1 3 40 3 9
1 4 60 2 .
1 5 80 2 .
2 1 40 6 .
2 2 28 4 .
2 3 40 2 .
2 4 39 3 .
3 1 10 . .
3 2 20 . .
3 3 30 . .
3 4 40 . .
. . . . .
The output should look like this:
1 2 3 4 . .
1 51 39 40 60
2 40 28 40 39
3 10 20 30 40
.
.
There are about a few thousand rows and columns in this CSV file, however I'm only interested is the first 3 columns of the CSV file. So the first and second columns are basically like co-ordinates for the matrix, and then populating the matrix with data in the 3rd column.
After lots of trial and error, I realised that numpy was the way to go with matrices. This is what I tried thus far with example data:
left_column = [1, 2, 1, 2, 1, 2, 1, 2]
middle_column = [1, 1, 3, 3, 2, 2, 4, 4]
right_column = [1., 5., 3., 7., 2., 6., 4., 8.]
import numpy as np
m = np.zeros((max(left_column), max(middle_column)), dtype=np.float)
for x, y, z in zip(left_column, middle_column, right_column):
x -= 1 # Because the indicies are 1-based
y -= 1 # Need to be 0-based
m[x, y] = z
print(m)
#: array([[ 1., 2., 3., 4.],
#: [ 5., 6., 7., 8.]])
However, it is unrealistic for me to specify all of my data in my script to generate the matrix. I tried using the generators to pull the data out of my CSV file but it didn't work well for me.
I learnt as much numpy as I could, however it appears like it requires my data to already be in matrix form, which it isn't.