I have a numpy array of one-hot vectors. I want to find the mode of these one-hot vectors. Note that this is not equivalent to finding the mode over the values.
e.g. for
x = [[0,0,0,1],
[0,0,0,1],
[0,0,1,0],
[0,1,0,0],
[1,0,0,0]]
assert vector_mode(x) == [0,0,0,1]
assert scipy.stats.mode(x) == [0,0,0,0]
What is the most efficient way to do this with numpy/scipy?
scipy.stats.modehas a loop in it that compares every value found in the array to the entire array, which can cause surprisingly bad performance for an array with a lot of distinct values in it. For example,scipy.stats.mode(range(10**5))is appallingly slow.