1

I am quite new to python. Searching previous questions I couldn't find the answer to this problem.

For a project I have to analyze a lot of .txt files and always perform the same calculations on it. To create a dataframe pandas was used, which works nicely.

I want an extra column with calculations performed on other columns, so for example c = a + b. For simple calculations this works just fine:

In [41]: import pandas as pd
In [42]: import numpy as np

In [43]: df = pd.DataFrame(np.random.randn(10,2),columns=list('ab'))

In [44]: df
Out[45]: 
      a         b
0  0.163138 -1.261099
1  0.094772 -0.553349
2 -1.677519 -0.966680
3  1.732083 -1.118715
4  0.172240 -0.404648
5  0.270712  0.089841
6  0.589787  1.569790
7  0.822016  0.857993
8 -0.269941  0.586059
9 -0.152639  0.240438

In [46]: df["c"] = df["a"] + df["b"]

In [47]: df
Out[48]: 
      a         b         c
0  0.163138 -1.261099 -1.097961
1  0.094772 -0.553349 -0.458577
2 -1.677519 -0.966680 -2.644198
3  1.732083 -1.118715  0.613368
4  0.172240 -0.404648 -0.232407
5  0.270712  0.089841  0.360554
6  0.589787  1.569790  2.159576
7  0.822016  0.857993  1.680010
8 -0.269941  0.586059  0.316118
9 -0.152639  0.240438  0.087800

The problem encountered happened when using more "complex" calculations:

# C1 and C2 are some constants needed for the calculations

In [49]: C1 = 1.5

In [50]: C2 = 2.5

In [51]: df["c"] = df["a"] + [(C1 * df["a"]) + (C2 * df["b"] ** 2)]

Exception: Data must be 1-dimensional 

Is there a workaround to this problem? Or am I handling this completely wrong?

1 Answer 1

1

Firstly the error is that you're wrapping the inner calculation result in square brackets, removing this fixes the error:

In [157]:

df["c"] = df["a"] + (C1 * df["a"]) + (C2 * df["b"] ** 2)
df
Out[157]:
          a         b         c
0  0.163138 -1.261099  4.383772
1  0.094772 -0.553349  1.002418
2 -1.677519 -0.966680 -1.857622
3  1.732083 -1.118715  7.459016
4  0.172240 -0.404648  0.839950
5  0.270712  0.089841  0.696959
6  0.589787  1.569790  7.635069
7  0.822016  0.857993  3.895420
8 -0.269941  0.586059  0.183810
9 -0.152639  0.240438 -0.237071

The issue is that the inner calculation produces a list containing a Series:

In [159]:
[(C1 * df["a"]) + (C2 * df["b"] ** 2)]
​
Out[159]:
[0    4.220634
 1    0.907646
 2   -0.180103
 3    5.726933
 4    0.667710
 5    0.426247
 6    7.045282
 7    3.073404
 8    0.453751
 9   -0.084432
 dtype: float64]

You then try to add the other column/Series and it doesn't understand how to align

Sign up to request clarification or add additional context in comments.

2 Comments

Thanks, removing the brackets worked fine! And indeed, I just wanted to multiply the result of df["b"] ** 2 by the constant C2, so that part was already ok.
OK wasn't sure, will remove the last part

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.