I'm new to Python so I've decided to solve some common challenges to improve my knowledge of the language. I learned about numpy and its efficient ndarrays so I attempted the following experiment:
Consider the 2 sum problem (e.g. here) and let's solve it the naive way (it doesn't matter for the purpose of this question). Here's a solution with python's lists:
from  itertools import combinations
def twosum1(n_lst):
    pairs=list(combinations(n_lst,2))
    solutions=[]
    for pair in pairs:
        if sum(pair)==7: solutions.append(pair)
    return(solutions)
Then I created a version using np.arrays expecting it will drastically speed up the calculation:
from  itertools import combinations
import numpy as np
def twosum2(n_lst):
    pairs=np.array(list(combinations(n_lst,2)),dtype=int)
    return pairs[pairs[:,1]+pairs[:,0]==7]
However, after timing the two functions, twosum2 is about 2x slower than twosum1. So I thought that the problem maybe in the dynamical selection of elements, so I've written an exact copy of twosum1 by replacing lists with ndarrays ...
def twosum3(n_lst):
    pairs=np.array(list(combinations(n_lst,2)))
    solutions=np.empty((0,2))
    for pair in pairs:
        if np.sum(pair)==7: 
            solutions=np.append(solutions,[pair],axis=0)
    return(solutions)
... and the resulting function was 10x slower than the original!
How is this possible? What I'm I doing wrong here? Clearly, removing loops and replacing lists with ndarrays is not enough to gain speed (contrary to what I learned reading this).
Edit:
- I use %timeit in jupyter to time the functions.
- I take identical benchmarks for all the functions I'm timing.
- The fact that I calculate combinations in the same way in the 3 functions tells me that the slowing down is due to numpy ... but don't see how.


n_lst? There is some copying overhead in the NumPy solutions, when you create an array from a list. And could you also mention or show your timing methodology?list(combinations(n_lst,2)). Adding a numpy wrapper after forcing the whole generator into memory is just clobbering your RAM for no good purpose. The actual comparison is not the bottleneck at all.np.appenddoes not work in-place anyway. It always creates new arrays.np.appendis just a confusing front end tonp.concatenate. It should be deprecated. Building an array by repeated concatenate is slow. It's better to build a list and do one array construction at the end.