Pandas dataframe computations

Question

I am trying compute a metric with panda dataframes. In particular, I get a results object

prediction = results.predict(start=1,end=len(test),exog=test)

The actual values are in a dataframe given by

test['actual'].

I need to compute two things:

How can I compute the sum of squares of errors? So basically, I would be doing an element by element subtraction and then summing the squares of these.
How can I compute the sum of squares of the predicted minus the mean of the actual values? So it would be
```
(x1-mean_actual)^2 + (x2-mean_actual)^2...+(xn-mean_actual)^2
```

roman · Accepted Answer · 2013-11-26 08:24:14Z

8

First one would be

((prediction - test['actual']) ** 2).sum()

Second one would be:

((prediction - test['actual'].mean()) ** 2).sum()

answered Nov 26, 2013 at 6:24

roman

118k30 gold badges205 silver badges209 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

I get a nan value for the first one. What does that imply?

do you have any NaN in your data?

I think these should be .sum()

@AndyHayden AFAIK, sum(...) will be slower than .sum()? is it possible to do it right with np.sum(...)?