How to find the index of the min/max object in a numpy array of objects?

Question

In a numpy array of objects (where each object has a numeric attribute y that can be retrieved by the method get_y()), how do I obtain the index of the object with the maximum (or minimum) y attribute (without explicit looping; to save time)? If myarray were a python list, I could use the following, but ndarray does not seem to support index. Also, numpy argmin does not seem to allow a provision for supplying the key. minindex = myarray.index(min(myarray, key = lambda x: x.get_y()))

Why is it an array, instead of a list? You already know how work with list. Iteration on a list is faster. — hpaulj
– hpaulj, Commented Sep 30, 2019 at 2:52
Thanks, but wouldn't something like numpy.argmin(), if one were (hypothetically) available for this situation, be faster than list iteration? I am a bit confused. Please help. — auser
– auser, Commented Sep 30, 2019 at 3:00
That's not how numpy works; operations aren't just magically faster because they are inside another object. The reason numpy is faster is because arrays can have a fixed representation, so they can be stored sequentially in memory and thus loaded into cache all at once, unlike a list, where each object is simply a pointer to some other object floating around in memory that has to be looked up. The object dtype in numpy is equivalent to this, which is the same as a python list. So there's no benefit to using numpy here. — alkasm
– alkasm, Commented Sep 30, 2019 at 3:16
@alkasm: Thanks very much, but I thought the object dtype is treated similarly in memory when the objects of the array are homogeneous (the same type and size), because the size of each object is known from either the explicit user-specified dtype or implicitly as the sum of the sizes of the components in an object, thus allowing a fixed-sized, contiguous allocation. I need to study more carefully. Can you please suggest some reading material (source) for me. Thanks again. — auser
– auser, Commented Sep 30, 2019 at 3:36

hpaulj · Accepted Answer · 2019-09-30 06:56:45Z

Some timings, comparing a numeric dtype, object dtype, and lists. Draw your own conclusions:

In [117]: x = np.arange(1000)                                                   
In [118]: xo=x.astype(object)                                                   

In [119]: np.sum(x)                                                             
Out[119]: 499500
In [120]: np.sum(xo)                                                            
Out[120]: 499500

In [121]: timeit np.sum(x)                                                      
10.8 µs ± 242 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [122]: timeit np.sum(xo)                                                     
39.2 µs ± 673 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [123]: sum(x)                                                                
Out[123]: 499500
In [124]: timeit sum(x)                                                         
214 µs ± 6.58 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [125]: timeit sum(xo)                                                        
25.3 µs ± 4.54 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [126]: timeit sum(x.tolist())                                                
29.1 µs ± 26.7 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [127]: timeit sum(xo.tolist())                                               
14.4 µs ± 120 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [129]: %%timeit  temp=x.tolist() 
     ...: sum(temp)                                                                      
6.27 µs ± 18.8 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

Collectives™ on Stack Overflow

How to find the index of the min/max object in a numpy array of objects?

1 Answer 1

Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Related