1

I'm programming a BOW code for a dataset of 30000 rows. I have X_train which is (21000, 2). Those 2 rows are: title and description. SO, I have the following code:

def text_to_bow(text: str) -> np.array:
    text = text.split()
    res = np.zeros(len(bow_vocabulary)) #bow_vocabulary includes 10000 most popular tokens
    for word in text:
        for i in range(len(bow_vocabulary)):
            if word == bow_vocabulary[i]:
                res[i] += 1
    return res
def items_to_bow(items: np.array) -> np.array:
    desc_index = 1
    res = np.empty((0,k), dtype='uint8')
    for i in range(len(items)):
        description = items[i][desc_index]
        temp = text_to_bow(description)
        res = np.append(res, [temp], axis=0)
    return np.array(res)

My code seems to work right as there are several asserts in my task.

So, when I run:

X_train_bow = items_to_bow(X_train)

I get the error:

MemoryError: Unable to allocate 12.1 MiB for an array with shape (158,
10000) and data type float64

I've already set overcommit_memory to 1 in Ubuntu, but it hasn't helped. I dont/t want to use 64bit python as well because there may be problems with modules.

I've also tried another function (with regular arrays):

def items_to_bow(items: np.array) -> np.array:
    desc_index = 1
    res = []
    for i in range(len(items)):
        description = items[i][desc_index]
        temp = text_to_bow(description)
        res.append(temp)
        if len(res)//1000 > 0:
            print(len(res))
    return np.array(res)

But it seems to be working for an hour or so, which is not convenient.

Are there any ways to solve the problem? Would be grateful for any possible help.

2
  • Provide sample data. Commented Apr 18, 2020 at 14:44
  • The data: link Commented Apr 18, 2020 at 14:47

1 Answer 1

1

Do chunking. In pandas, you use the chunksize param for this. Read a data chunk. Process the data. Append your output to a file. Make sure the chunk is deleted. Repeat.

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.