I'am getting an memory error while iterating over pandas dataframe. How to resolve this?

Question

I want to multiply each column with a different number and update the values for this data frame.

The code I have written is:

for j in test.columns:

    for i in r:

        for k in range(len(p)):

            test[i] = test[j].apply(lambda x:x*p[k])

            p.remove(p[k])

            break

        r.remove(i)

        break

And p is list of numbers that I want to multiply with.

p = [74, 46, 97, 2023, 364, 1012, 8, 242, 422, 78, 55, 90, 10, 44, 1, 3, 105, 354, 4, 26, 87, 18, 889, 9, 557, 630, 214, 1765, 760, 3344, 136, 26, 56, 10, 2, 2171, 125, 446, 174, 4, 174, 2, 80, 11, 160, 17, 72]

r is list of column names.

How to get rid of this error?

meta.stackoverflow.com/questions/285551/… - Please dont paste images of code, it makes searching impossible. — tomgalpin
– tomgalpin, Commented Jan 30, 2020 at 7:25
please add your code not as picture but as a dataframe so we can solve your problem. It looks as though you are using redundant for loops + apply, that take a lot of memory.. — E. Gertz
– E. Gertz, Commented Jan 30, 2020 at 7:25
Yes. Each column with respective number in list in the same order. — dj2560
– dj2560, Commented Jan 30, 2020 at 7:34

Fabrizio · Accepted Answer · 2020-01-30 08:17:31Z

1

According to your initial statement "I want to multiply each column with a different number" I wrote this answer. It's unclear why, in your code, you have to use remove so many times and why you use so many for loops. In my case, I generated a random dataframe of 100 rows and 5 columns, and an array of 5 values for the multiplication.

import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randint(0,100,size=(100, 5)), columns=list('12345'))
p=np.random.randint(0,100,5)
for i in range(5):
    df.iloc[:,i]=df.iloc[:,i]*p[i]

answered Jan 30, 2020 at 8:17

Fabrizio

94711 silver badges23 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Valdi_Bo · Accepted Answer · 2020-01-30 08:38:05Z

Your stacktrace points to test[i] = test[j].apply(lambda x:x*p[k]).

Note that j (at least in your code sample) has not been set.

Maybe you should put i instead?

Another solution

If you want to multiply:

each column from test,
in-place,
by consecutive numbers from p (it may be even a plain Python list),
but only as many initial elements as p has,
assuming that p is not longer than the number of rows in test,

you can do it with the following one-liner:

test.iloc[:len(p)] = test.iloc[:len(p)].apply(lambda col: col * p)

To test this solution, I created test DataFrame containing first 10 rows from your sample.

Then I defined p as: p = [2, 3, 4, 5, 6, 7].

The result of my code was:

    0   1    2     3    4
0   6   8    8   282   42
1  39  24   42  1434  153
2   4   0    8   336   48
3  40  20   65  1085  160
4  84  66   72  2130  366
5  91  49  119  3283  469
6   5   6   11   140   17
7   4   8   12   278   51
8   6   8   12   271   36
9  29  25   37   741  149

So, as far as first 6 rows are concerned, in each column:

the first element has been multiplied by 2,
the second by 3,
and so on.

Maybe this is just what you need?

Collectives™ on Stack Overflow

I'am getting an memory error while iterating over pandas dataframe. How to resolve this?

2 Answers 2

Comments

Another solution

Comments

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Another solution

Comments

Related