1

Consider the numpy array below. I'd hoping to find a fast way to remove rows not having 4 distinct values.

import numpy as np

D = np.array([[2, 3, 6, 7],
              [2, 4, 3, 4],
              [4, 9, 0, 1],
              [5, 5, 2, 5],
              [7, 5, 4, 8],
              [7, 5, 4, 7]])

In the small sample array show, the output should be:

D = np.array([[2, 3, 6, 7],
              [4, 9, 0, 1],
              [7, 5, 4, 8]])

2 Answers 2

1

Here's one way -

In [94]: s = np.sort(D,axis=1)

In [95]: D[(s[:,:-1] == s[:,1:]).sum(1) ==0]
Out[95]: 
array([[2, 3, 6, 7],
       [4, 9, 0, 1],
       [7, 5, 4, 8]])

Alternatively -

In [107]: D[~(s[:,:-1] == s[:,1:]).any(1)]
Out[107]: 
array([[2, 3, 6, 7],
       [4, 9, 0, 1],
       [7, 5, 4, 8]])

Or -

In [112]: D[(s[:,:-1] != s[:,1:]).all(1)]
Out[112]: 
array([[2, 3, 6, 7],
       [4, 9, 0, 1],
       [7, 5, 4, 8]])

With pandas -

In [121]: import pandas as pd

In [122]: D[pd.DataFrame(D).nunique(1)==4]
Out[122]: 
array([[2, 3, 6, 7],
       [4, 9, 0, 1],
       [7, 5, 4, 8]])
Sign up to request clarification or add additional context in comments.

4 Comments

As a relative newcomer to Numpy, I appreciate the multiple approaches.
Your solutions worked great and were very fast. But, I'm puzzled by the 1 argument used in .sum(1), any(1), and .all(1). In each of these, I thought a singular argument would have to be the array, in these cases, s. Could this be explained?
@user109387 Those are just axis arguments. So, those are equivalent to .sum(axis=1) and so on. Does that make sense?
Yes, that's fine. Its just that I thought the array name argument was mandatory.
0

A working answer with np.unique

I found no way to use the axis keyword in np.unique to get rid of the list compression, perhaps someone can help?

D[np.array([np.max(np.unique(_,return_counts=True)[-1]) for _ in D])==1]

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.