Get unique rows from Numpy Array based on a value within the row

Question

I have an Numpy Array and wish to output from it the unique rows based on the value of the first element in each row of the array. I can get partial success in getting the first values of the unique rows but not the full row, e.g.

dataA = np.array([(107.,  7.475729,  6.573791, 90.0126 , 0.5529882, 0.867588 ),
 (107.,  7.408565,  6.38974 , 89.97312, 0.553728 , 0.8670179),
 (108.,  7.838725,  6.961871, 89.52572, 0.5610707, 0.7769735),
 (108.,  7.795123,  7.054095, 89.62989, 0.5592708, 0.7742778),
 (109.,  7.079929,  6.86194 , 89.6181 , 0.5660294, 0.8596874),
 (109.,  7.058383,  6.671512, 89.52995, 0.5663874, 0.8610857)])


print('Original Array :' , dataA)

# Get unique values from complete 2D array
uniqueValues = np.unique(dataA)

print('Unique Values : ', uniqueValues)

# Get unique rows from  numpy array
uniqueRows = np.unique(dataA[:,0], axis=0)

print('Unique Rows : ', uniqueRows, sep='\n')

This gives:

Unique Rows : 
[107. 108. 109.]

desired results:
[(107.,  7.475729,  6.573791, 90.0126 , 0.5529882, 0.867588 ),
 (108.,  7.838725,  6.961871, 89.52572, 0.5610707, 0.7769735),
 (109.,  7.079929,  6.86194 , 89.6181 , 0.5660294, 0.8596874)])

Even though the above works to the point that it will give me the row ID's it seems to fail when I have nan's

dataA = np.array([(107.,  7.475729,  6.573791, 90.0126 , 0.5529882, 0.867588 , nan, nan)
 (107.,  7.408565,  6.38974 , 89.97312, 0.553728 , 0.8670179, nan, nan)
 (108.,  7.838725,  6.961871, 89.52572, 0.5610707, 0.7769735, nan, nan)
 (108.,  7.795123,  7.054095, 89.62989, 0.5592708, 0.7742778, nan, nan)
 (109.,  7.079929,  6.86194 , 89.6181 , 0.5660294, 0.8596874, nan, nan)
 (109.,  7.058383,  6.671512, 89.52995, 0.5663874, 0.8610857, nan, nan)
 (110.,  7.727924,  7.116364, 90.45003, 0.5366358, 0.8887361, nan, nan)
 (110.,  7.748454,  7.223625, 90.6782 , 0.5349852, 0.8855141, nan, nan)])

np.unique(dataA[:,0], axis=0) doesn't give you unique rows, but unique values in the first column. — Nils Werner
– Nils Werner, Commented Jul 24, 2019 at 8:06

yatu · Accepted Answer · 2019-07-24 08:27:53Z

You could check where in the array the first value in a row is equal to that of the next row, and index based on the result:

dataA[dataA[:, 0] == np.roll(dataA, -1, axis=0)[:, 0]]

array([[107.       ,   7.475729 ,   6.573791 ,  90.0126   ,   0.5529882,
          0.867588 ],
       [108.       ,   7.838725 ,   6.961871 ,  89.52572  ,   0.5610707,
          0.7769735],
       [109.       ,   7.079929 ,   6.86194  ,  89.6181   ,   0.5660294,
          0.8596874]])

If the rows are not ordered based on the first value, instead use:

s = dataA[:,0].argsort()
dataA[s][dataA[s, 0] == np.roll(dataA, -1, axis=0)[s, 0]]

For the second example it yields:

array([[107.       ,   7.475729 ,   6.573791 ,  90.0126   ,   0.5529882,
          0.867588 ,         nan,         nan],
       [108.       ,   7.838725 ,   6.961871 ,  89.52572  ,   0.5610707,
          0.7769735,         nan,         nan],
       [109.       ,   7.079929 ,   6.86194  ,  89.6181   ,   0.5660294,
          0.8596874,         nan,         nan],
       [110.       ,   7.727924 ,   7.116364 ,  90.45003  ,   0.5366358,
          0.8887361,         nan,         nan]])

Hi yatu, it works fine until I use the second set of data with the nan's.

Collectives™ on Stack Overflow

Get unique rows from Numpy Array based on a value within the row

1 Answer 1

1 Comment

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Related