0

I'm reading a reference [1] on data analysis with python and testing the code in my laptop. The text discusses how using numpy arrays can speed up things as compared to using built-in lists instead.

I'm surprised, however, by getting right the opposite results:

In [5]: L =range(10000000); %timeit sum(L)
1 loops, best of 3: 201 ms per loop

In [9]: xL=np.array(L,dtype=int); %timeit sum(xL)
1 loops, best of 3: 6.79 s per loop

The first sum is supposed to be much slower than the second. Changing the dtype option value doesn't change the result.

I'm using ipython (2.4.0) notebook with Firefox on a OSX 10.6.8. Maybe a problem with my (old) version of python/OS?

[1] "Statistics, Data Mining and Machine Learning in Astronomy: A Practical Python for the Survey of Data", Zeljko Ivezic et al., Princeton Univ. Press 2014. Appendix A.8.

4
  • I apologize for such a dump question. I should have been more careful when copying the code. I'm not sure if I should delete this question, after having gotten two correct answers though. Commented Dec 11, 2015 at 20:50
  • @unutbu , wim, you both have high enough a reputation for not changing much if I delete my question (I guess that would remove the reputation you gained here). Honestly, I think this question of mine just contributes to the noise in here. Commented Dec 11, 2015 at 21:07
  • MASL I think your question is fine, there is no such thing as dump questions. Only dumb questions. Commented Dec 11, 2015 at 21:11
  • :) And that too! Definitely not the right day. Commented Dec 11, 2015 at 21:16

2 Answers 2

2

You need to call the NumPy array's sum method, not the plain Python builtin sum function, in order to take advantage of NumPy:

In [32]: L =range(10000000)

In [33]: %timeit sum(L)
10 loops, best of 3: 82.4 ms per loop

In [34]: xL=np.array(L,dtype=int)

In [35]: %timeit xL.sum()
100 loops, best of 3: 9.49 ms per loop
Sign up to request clarification or add additional context in comments.

1 Comment

Arg! My bad! I should have been more careful. Thanks!
2

You're using the python sum on the numpy array instead of numpy's sum:

>>> import numpy as np
>>> L = range(10000000)
>>> timeit sum(L)
10 loops, best of 3: 69.9 ms per loop
>>> xL = np.array(L, dtype=int)
>>> timeit sum(xL)
1 loops, best of 3: 715 ms per loop

Slooooow! Here's the 10x speedup:

>>> timeit xL.sum()
100 loops, best of 3: 7.34 ms per loop
>>> timeit np.sum(xL)
100 loops, best of 3: 7.38 ms per loop

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.