This might not be a perfect answert but I hope I can help you:
1.) Why isn't it working properly: Because dtype=[('', arr.dtype)] * arr.shape[1] != dtype=[('', arr.dtype, arr.shape[1])]
2.) What's the difference between those two? Well, while the first one adds the length to the list the second one multiplies the list.
This means the output of the first one is something like: [('', dtype('O'), 3)] whereas the second is [('', dtype('O')), ('', dtype('O')), ('', dtype('O'))]
3.) The sort is clearly doing something wrong - no the input was just in the wrong format
4.) Is it using pointers as the sort keys? Do you mean if it formats the data by the data-keys? Then no it sorts them according to the data itself.
EDIT:
Ok to make it more clear:
First of all I think you misunderstood @Mike MacNeil's anwer. To make it more plastic here some examples:
Let's consider a class Foo:
class Foo:
def __init__(self, id):
self._id = id
def get_id(self):
return self._id
def __le__(self, ob):
return self < ob or self == ob
def __lt__(self, ob):
return self.get_id() < ob.get_id()
def __ge__(self, ob):
return not self < ob
def __gt__(self, ob):
return not self <= ob
def __eq__(self, ob):
return self.get_id() == ob.get_id()
def __str__(self):
return f'Foo({self.get_id()})'
def __repr__(self):
rep = super().__repr__()
return f'{str(self)} {rep[rep.index("at"):rep.index(">")]}'
We see the comparisons have been implemented just like in string. I also implemented the __repr__() and __str__() methods give me a second and you will understand why:
Let's create a numpy array in the first step:
>>> arr4 = np.array([[Foo(1), Foo(2), Foo(3)],
[Foo(4), Foo(5), Foo(6)],
[Foo(7), Foo(8), Foo(9)],
[Foo(10), Foo(11), Foo(12)]])
If we print it, it will look something like this:
>>> arr4
array([[Foo(1) at 0x000002411F753F08, Foo(2) at 0x000002411F73FF48, Foo(3) at 0x000002411F74EE48],
[Foo(4) at 0x000002411F74EE88, Foo(5) at 0x000002411F74EE08, Foo(6) at 0x000002411F756148],
[Foo(7) at 0x000002411F7561C8, Foo(8) at 0x000002411F756208, Foo(9) at 0x000002411F756248],
[Foo(10) at 0x000002411F756288, Foo(11) at 0x000002411F7562C8,
Foo(12) at 0x000002411F756308]], dtype=object)
If we now print the ndarray...
>>> np.ndarray(arr4.shape[0], dtype=[('', arr4.dtype, arr4.shape[1])], buffer=arr4)
array([([Foo(1) at 0x000002411F753F08, Foo(2) at 0x000002411F73FF48, Foo(3) at 0x000002411F74EE48],),
([Foo(4) at 0x000002411F74EE88, Foo(5) at 0x000002411F74EE08, Foo(6) at 0x000002411F756148],),
([Foo(7) at 0x000002411F7561C8, Foo(8) at 0x000002411F756208, Foo(9) at 0x000002411F756248],),
([Foo(10) at 0x000002411F756288, Foo(11) at 0x000002411F7562C8, Foo(12) at 0x000002411F756308],)], dtype=[('f0', 'O', (3,))])
...we see it basically has the same shape as
>>> np.ndarray(arr.shape[0], dtype=[('', arr.dtype, arr.shape[1])], buffer=arr)
array([['t', 'p', 'g'],
['n', 's', 'd'],
['g', 'h', 'o'],
['g', 'g', 'n'],
['f', 'j', 'x']], dtype=[('f0', 'O', (3,))])
After sorting the Foo-Array with np.ndarray(arr4.shape[0], dtype=[('', arr4.dtype, arr4.shape[1])], buffer=arr4).sort() we see that the output of arr4 looks something like:
>>> arr4
array([[Foo(1) at 0x000002411F753F08, Foo(2) at 0x000002411F73FF48, Foo(3) at 0x000002411F74EE48],
[Foo(10) at 0x000002411F756288, Foo(11) at 0x000002411F7562C8, Foo(12) at 0x000002411F756308],
[Foo(4) at 0x000002411F74EE88, Foo(5) at 0x000002411F74EE08, Foo(6) at 0x000002411F756148],
[Foo(7) at 0x000002411F7561C8, Foo(8) at 0x000002411F756208, Foo(9) at 0x000002411F756248]], dtype=object)
although
>>> Foo(10) > Foo(4)
True
(Still np.ndarray(arr4.shape[0], dtype=[('', arr4.dtype)] * arr4.shape[1], buffer=arr4).sort() can print out the expected result sorted with the id-key using the defined comparison-functions.)
The comparison-rule for the dtype=object is not as you would expect just using the standard comparison functions but is comparing the objects representations (→ in this case this would mean that e.g. repr(Foo(10)) < repr(Foo(2)) would be True although we would actually expect Foo(10) to be greater than Foo(2)).
But by telling numpy the exact dimensions/shape, numpy uses the standard comparisons which will lead to the expected result because it now knows that all the elements of a row are from exactly the same type and not just some random objects clenched together into one array. This is why also your example didn't work with string however would work with str as str will be supported natively by numpy (<U1).
[(['f', 'r', 'h'],)]) while second one creates a structured array from elements directly (e.g.[('f', 'r', 'h')]). I would guess the first case sorts by array and second by elements.sortspecifically mentions that the structure fields are sorted in lexicographical order, but it's a bit unclear how that applies in this case. If we sort arrays by pointer value, then why not sort scalars the same way. If we sort scalars by object comparison, then why not sort arrays the same way?