Create a matrix from a text file - python

Question

I would like to create a matrix from a three column file. I am sure it's something extremely easy, but I just do not understand how it needs to be done. Please be gentle, I am a beginner to python. Thank you

The format of my input file

A A 5 
A B 4 
A C 3 
B B 2 
B C 1 
C C 0

Desired output - complete matrix

Or - half matrix

I tried this, but as I said, I am VERY new to python and programming.

import numpy as np

for line in file('test').readlines():
    name1, name2, value = line.strip().split('\t')

a = np.matrix([[name1], [name2], [value]])
print a

WORKING SCRIPT - One of my friend also helped me, so if anyone if interested in a simpler script, here it is. It's not the most efficient, but works perfectly.

data = {}
names = set([])

for line in file('test').readlines():
    name1, name2, value = line.strip().split('\t')
    data[(name1, name2)] = value
    names.update([name1])

names = sorted(list(names))
print  names
print data

output = open('out.txt', 'w')

output.write("\t%s\n" % ("\t".join(names)))
for nameA in names:
    output.write("%s" % nameA)
    for nameB in names:
        key = (nameA, nameB)
        if key in data:
            output.write("\t%s" % data[(nameA, nameB)]) 
        else:
            output.write("\t")  
    output.write("\n")


output.close()

Have you tried reading in file contents at least? Can you show some code? Idea would be that you read in the file contents, without the special characters (see strip()). You can retrieve individual values from there by spliting each line with split(" "). After that it depends if you want a numpy matrix, or are you satisfied by using nested lists. — ljetibo
– ljetibo, Commented May 11, 2015 at 11:15

Moritz · Accepted Answer · 2015-05-11 12:30:38Z

4

Try:

import pandas as pd
import numpy as np

raw = []
with open('test.txt','r') as f:
    for line in f:
        raw.append(line.split())
data = pd.DataFrame(raw,columns = ['row','column','value'])
data_ind = data.set_index(['row','column']).unstack('column')
np.array(data_ind.values,dtype=float))

Output:

array([[ 5., 4., 3.], [ nan, 2., 1.], [ nan, nan, 0.]])

edited May 11, 2015 at 12:30

answered May 11, 2015 at 11:34

Moritz

5,44813 gold badges47 silver badges89 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Viki Over a year ago

thank you so much! It does exactly what I asked for! Have a nice day!

Moritz Over a year ago

It is worth the time digging a bit into pandas. Look at the dataframe after set_index and unstack. For me it is very usefull. E.g. Analysis of microtiterplates

ljetibo · Accepted Answer · 2015-05-11 11:57:40Z

Although there's already an accepted answer, it uses pandas. A relatively generic way of getting the same effect but by not using a additional library is this: (numpy is used because OP specified numpy, however you can achieve the same thing with lists)

import string
import numpy as np

up = string.ascii_uppercase
uppercase = list()
for letter in up:
    uppercase.append(letter)

file = open("a.txt")

matrix = np.zeros((3, 3))

for line in file.readlines():
    tmp = line.strip()
    tmp = tmp.split(" ")
    idx = uppercase.index(tmp[0])
    idy = uppercase.index(tmp[1])
    matrix[idx, idy] = tmp[2]

Idea is that you gather all the alphabetical letters, hopefully OP will limit themselves to just the English alphabet without special chars (šđćžčę°e etc...).

We create a list of from the alphabet so that we can use the index method to retrieve the index value. I.e. uppercase.index("A") is 0. We can use these indices to fill in our array.

Read in file line by line, strip extra characters, split by space to get:

['A', 'A', '5']
['A', 'B', '4']

This is now the actual working part:

    idx = uppercase.index(tmp[0])
    idy = uppercase.index(tmp[1])
    matrix[idx, idy] = tmp[2]

I.e. for letter "A", idx evaluates to 0 and so does idy. Then matrix[0,0] becomes the value tmp[2] which is 4. Following the same logic for "B" we get matrix[0,1]=5. And so on.

A more generalized case would be to declare matrix = np.zeros((3, 3)) as matrix = np.zeros((26, 26)) because there are 26 letters in english alphabet and the OP doesn't have to just use "ABC", but could potentially use the entire range A-Z.

Example output for upper program would be:

>>> matrix
array([[ 5.,  4.,  3.],
       [ 0.,  2.,  1.],
       [ 0.,  0.,  0.]])

Rohit Barnwal · Accepted Answer · 2015-05-11 11:11:49Z

2

You can use this library http://docs.scipy.org/doc/numpy/reference/generated/numpy.matrix.html

You just need to make proper adjustment.

hope it helps.

answered May 11, 2015 at 11:11

Rohit Barnwal

4926 silver badges19 bronze badges

1 Comment

Viki Over a year ago

Yes, my problem is to make the proper adjustment. I am new to python, sorry if my question is very basic. Thank you.

ypx · Accepted Answer · 2015-05-11 12:07:42Z

You're matrix seems to resember an adjacency matrix of a graph.

I find the answer with pandas much more concise and elegant. Here's my attempt without adding pandas as an additional dependency.

<!-- language: python -->
f = open('.txt', 'r');

EdgeKey = namedtuple("EdgeKey", ["src", "dst"])

g = dict()
for line in f:

    elems = line.split(' ');
    key = EdgeKey(src=elems[0], dst=elems[1])
    g[key] = elems[2]
    key_rev = EdgeKey(src=elems[1], dst=elems[0]) # g[A, B] == g[B, A]
    g[key_rev] = elems[2]

vertices = set()
for src, dst in g.keys():
    vertices.add(src)
    vertices.add(dst)

vertices = list(vertices)
vertices.sort()

# create adjacency matrix
mat  = np.zeros((len(vertices), len(vertices)))
for s, src in enumerate(vertices):
    for d, dst in enumerate(vertices):
        e = EdgeKey(src=src, dst=dst)
        if e in g:
            mat[s, d] = int(g[e])

# print adjacency matrix
print ' ' , ' '.join(vertices) # print header
for i, row in enumerate(mat):
    print vertices[i], ' '.join([str(int(c)) for c in row.tolist()])

Collectives™ on Stack Overflow

Create a matrix from a text file - python

4 Answers 4

2 Comments

Comments

1 Comment

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

2 Comments

Comments

1 Comment

Comments

Linked

Related