Populating python matrix

Question

I'm doing the splitting of the words from the text file in python. I've receive the number of row (c) and a dictionary (word_positions) with index. Then I create a zero matrix (c, index). Here is the code:

from collections import defaultdict
import re
import numpy as np

c=0

f = open('/Users/Half_Pint_Boy/Desktop/sentenses.txt', 'r')

for line in f:
    c = c + 1

word_positions = {}

with open('/Users/Half_Pint_Boy/Desktop/sentenses.txt', 'r') as f:
    index = 0
    for word in re.findall(r'[a-z]+', f.read().lower()):
        if word not in word_positions:
            word_positions[word] = index
            index += 1
print(word_positions)

matrix=np.zeros(c,index)

My question: How can I populate the matrix to be able to get this: matrix[c,index] = count, where c - is the number of row, index -the indexed position and count -the number of counted words in a row

It's not clear what you are trying to do. Could you add more explanation / a simple example? — Andrew
– Andrew, Commented Jul 25, 2016 at 11:57
if you have a line (string format) names lines, you can get the number of words just by using, len(lines.split()) (length of the array made from the string split at each whitespace) — HolyDanna
– HolyDanna, Commented Jul 25, 2016 at 12:52
I have 22 rows and 254 unique words in text. So it will be the size of my matrix and then I just need to count numbers of each words for row for every indexed unique word that I have.Hope it is clearer now — Keithx
– Keithx, Commented Jul 25, 2016 at 14:11

user3882036 · Accepted Answer · 2016-07-25 16:41:04Z

Try next:

import re
import numpy as np
from itertools import chain

text = open('/Users/Half_Pint_Boy/Desktop/sentenses.txt')

text_list = text.readlines()

c=0

for i in range(len(text_list)):
    c=c+1

text_niz = []

for i in range(len(text_list)):
    text_niz.append(text_list[i].lower()) # перевел к нижнему регистру

slovo = []

for j in range(len(text_niz)):
    slovo.append(re.split('[^a-z]', text_niz[j])) # токенизация

for e in range(len(slovo)):

    while slovo[e].count('') != 0:
        slovo[e].remove('') # удалил пустые слова

slovo_list = list(chain(*slovo))
print (slovo_list) # составил список слов

slovo_list=list(set(slovo_list)) # удалил повторяющиеся
x=len(slovo_list)

s = []

for i in range(len(slovo)):
    for j in range(len(slovo_list)):
        s.append(slovo[i].count(slovo_list[j])) # посчитал количество слов в каждом предложении

matr = np.array(s) # матрица вхождений слов в предложения
d = matr.reshape((c, x)) # преобразовал в матрицу 22*254

rtmh · Accepted Answer · 2016-07-25 13:26:07Z

It looks like you are trying to create something similar to an n-dimensional list. these are achieved by nesting lists inside themselves as such:

two_d_list = [[0, 1], [1, 2], [example, blah, blah blah]]
words = two_d_list[2]
single_word = two_d_list[2][1]  # Notice the second index operator

This concept is very flexible in Python and can also be done with a dictionary nested inside as you would like:

two_d_list = [{"word":1}, {"example":1, "blah":3}]
words = two_d_list[1]  # type(words) == dict
single_word = two_d_list[2]["example"]  # Similar index operator, but for the dictionary

This achieves what you would like, functionally, but does not use the syntax matrix[c,index], however this syntax does not really exist in python for indexing. Commas within square-brackets usually delineate the elements of list literals. Instead you can access the row's dictionary's element with matrix[c][index] = count

You may be able to overload the index operator to achieve the syntx you want. Here is a question about achieving the syntax you desire. In summary:

Overload the __getitem__(self, inex) function in a wrapper of the list class and set the function to accept a tuple. The tuple can be created without parenthesis, giving the syntax matrix[c, index] = count

Collectives™ on Stack Overflow

Populating python matrix

2 Answers 2

Comments

Comments

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Related