In a numpy array of objects (where each object has a numeric attribute y that can be retrieved by the method get_y()), how do I obtain the index of the object with the maximum (or minimum) y attribute (without explicit looping; to save time)? If myarray were a python list, I could use the following, but ndarray does not seem to support index. Also, numpy argmin does not seem to allow a provision for supplying the key. minindex = myarray.index(min(myarray, key = lambda x: x.get_y()))
-
Why is it an array, instead of a list? You already know how work with list. Iteration on a list is faster.hpaulj– hpaulj2019-09-30 02:52:53 +00:00Commented Sep 30, 2019 at 2:52
-
Fast numpy calculations require numeric dtypes, not object.hpaulj– hpaulj2019-09-30 02:58:17 +00:00Commented Sep 30, 2019 at 2:58
-
Thanks, but wouldn't something like numpy.argmin(), if one were (hypothetically) available for this situation, be faster than list iteration? I am a bit confused. Please help.auser– auser2019-09-30 03:00:02 +00:00Commented Sep 30, 2019 at 3:00
-
That's not how numpy works; operations aren't just magically faster because they are inside another object. The reason numpy is faster is because arrays can have a fixed representation, so they can be stored sequentially in memory and thus loaded into cache all at once, unlike a list, where each object is simply a pointer to some other object floating around in memory that has to be looked up. The object dtype in numpy is equivalent to this, which is the same as a python list. So there's no benefit to using numpy here.alkasm– alkasm2019-09-30 03:16:47 +00:00Commented Sep 30, 2019 at 3:16
-
@alkasm: Thanks very much, but I thought the object dtype is treated similarly in memory when the objects of the array are homogeneous (the same type and size), because the size of each object is known from either the explicit user-specified dtype or implicitly as the sum of the sizes of the components in an object, thus allowing a fixed-sized, contiguous allocation. I need to study more carefully. Can you please suggest some reading material (source) for me. Thanks again.auser– auser2019-09-30 03:36:56 +00:00Commented Sep 30, 2019 at 3:36
|
Show 2 more comments
1 Answer
Some timings, comparing a numeric dtype, object dtype, and lists. Draw your own conclusions:
In [117]: x = np.arange(1000)
In [118]: xo=x.astype(object)
In [119]: np.sum(x)
Out[119]: 499500
In [120]: np.sum(xo)
Out[120]: 499500
In [121]: timeit np.sum(x)
10.8 µs ± 242 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [122]: timeit np.sum(xo)
39.2 µs ± 673 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [123]: sum(x)
Out[123]: 499500
In [124]: timeit sum(x)
214 µs ± 6.58 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [125]: timeit sum(xo)
25.3 µs ± 4.54 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [126]: timeit sum(x.tolist())
29.1 µs ± 26.7 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [127]: timeit sum(xo.tolist())
14.4 µs ± 120 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [129]: %%timeit temp=x.tolist()
...: sum(temp)
6.27 µs ± 18.8 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)