Finding a numpy mode vector

Question

I have a numpy array of one-hot vectors. I want to find the mode of these one-hot vectors. Note that this is not equivalent to finding the mode over the values.

e.g. for

x = [[0,0,0,1],
     [0,0,0,1],
     [0,0,1,0],
     [0,1,0,0],
     [1,0,0,0]]

assert vector_mode(x) == [0,0,0,1]
assert scipy.stats.mode(x) == [0,0,0,0]

What is the most efficient way to do this with numpy/scipy?

You'll probably end up having to lexsort it and find the longest run of equal rows. — user2357112
– user2357112, Commented Sep 7, 2017 at 18:25
The key here is that these are one-hot vectors. Makes life muuuuch easier. — Mad Physicist
– Mad Physicist, Commented Sep 7, 2017 at 18:26
While we're on the topic, I'd like to point out that scipy.stats.mode has a loop in it that compares every value found in the array to the entire array, which can cause surprisingly bad performance for an array with a lot of distinct values in it. For example, scipy.stats.mode(range(10**5)) is appallingly slow. — user2357112
– user2357112, Commented Sep 7, 2017 at 18:37

Divakar · Accepted Answer · 2017-09-07 18:33:28Z

2

We are dealing with one-hot vectors as rows of the 2D input array. So, argmax of each row would be unique to each one-hot vector. Get those. Then, get their counts. Anyone of the rows with the max argmax count would be the desired mode row output. Let's pick the first off those with one more use of argmax and finally index into 2D input.

Hence, one implementation -

idx = np.argmax(x,1)
count = np.bincount(idx)
out = x[(idx==count.argmax()).argmax()]

edited Sep 7, 2017 at 18:33

answered Sep 7, 2017 at 18:27

Divakar

222k19 gold badges273 silver badges374 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Mad Physicist · Accepted Answer · 2017-09-07 18:25:40Z

1

If your vectors are one-hot, you can just use argmax to get the index of the hotspot and compute the mode of those:

hot = np.argmax(x, axis=1)
mode = scipy.stats.mode(hot).mode

In this case, mode is 3, meaning that the most common vector has a hotspot in index 3.

If you want to reinstate this into a one-hot vector, you can do:

vec = np.zeros(4)
vec[mode] = 1

answered Sep 7, 2017 at 18:25

Mad Physicist

116k29 gold badges201 silver badges291 bronze badges

Collectives™ on Stack Overflow

Finding a numpy mode vector

2 Answers 2

Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Linked

Related