0

I have an Numpy Array and wish to output from it the unique rows based on the value of the first element in each row of the array. I can get partial success in getting the first values of the unique rows but not the full row, e.g.

dataA = np.array([(107.,  7.475729,  6.573791, 90.0126 , 0.5529882, 0.867588 ),
 (107.,  7.408565,  6.38974 , 89.97312, 0.553728 , 0.8670179),
 (108.,  7.838725,  6.961871, 89.52572, 0.5610707, 0.7769735),
 (108.,  7.795123,  7.054095, 89.62989, 0.5592708, 0.7742778),
 (109.,  7.079929,  6.86194 , 89.6181 , 0.5660294, 0.8596874),
 (109.,  7.058383,  6.671512, 89.52995, 0.5663874, 0.8610857)])


print('Original Array :' , dataA)

# Get unique values from complete 2D array
uniqueValues = np.unique(dataA)

print('Unique Values : ', uniqueValues)

# Get unique rows from  numpy array
uniqueRows = np.unique(dataA[:,0], axis=0)

print('Unique Rows : ', uniqueRows, sep='\n')

This gives:

Unique Rows : 
[107. 108. 109.]

desired results:
[(107.,  7.475729,  6.573791, 90.0126 , 0.5529882, 0.867588 ),
 (108.,  7.838725,  6.961871, 89.52572, 0.5610707, 0.7769735),
 (109.,  7.079929,  6.86194 , 89.6181 , 0.5660294, 0.8596874)])

Even though the above works to the point that it will give me the row ID's it seems to fail when I have nan's

dataA = np.array([(107.,  7.475729,  6.573791, 90.0126 , 0.5529882, 0.867588 , nan, nan)
 (107.,  7.408565,  6.38974 , 89.97312, 0.553728 , 0.8670179, nan, nan)
 (108.,  7.838725,  6.961871, 89.52572, 0.5610707, 0.7769735, nan, nan)
 (108.,  7.795123,  7.054095, 89.62989, 0.5592708, 0.7742778, nan, nan)
 (109.,  7.079929,  6.86194 , 89.6181 , 0.5660294, 0.8596874, nan, nan)
 (109.,  7.058383,  6.671512, 89.52995, 0.5663874, 0.8610857, nan, nan)
 (110.,  7.727924,  7.116364, 90.45003, 0.5366358, 0.8887361, nan, nan)
 (110.,  7.748454,  7.223625, 90.6782 , 0.5349852, 0.8855141, nan, nan)])
2
  • 1
    np.unique(dataA[:,0], axis=0) doesn't give you unique rows, but unique values in the first column. Commented Jul 24, 2019 at 8:06
  • 1
    I see where I was going wrong now, thanks yatu!! Commented Jul 24, 2019 at 8:44

1 Answer 1

1

You could check where in the array the first value in a row is equal to that of the next row, and index based on the result:

dataA[dataA[:, 0] == np.roll(dataA, -1, axis=0)[:, 0]]

array([[107.       ,   7.475729 ,   6.573791 ,  90.0126   ,   0.5529882,
          0.867588 ],
       [108.       ,   7.838725 ,   6.961871 ,  89.52572  ,   0.5610707,
          0.7769735],
       [109.       ,   7.079929 ,   6.86194  ,  89.6181   ,   0.5660294,
          0.8596874]])

If the rows are not ordered based on the first value, instead use:

s = dataA[:,0].argsort()
dataA[s][dataA[s, 0] == np.roll(dataA, -1, axis=0)[s, 0]]

For the second example it yields:

array([[107.       ,   7.475729 ,   6.573791 ,  90.0126   ,   0.5529882,
          0.867588 ,         nan,         nan],
       [108.       ,   7.838725 ,   6.961871 ,  89.52572  ,   0.5610707,
          0.7769735,         nan,         nan],
       [109.       ,   7.079929 ,   6.86194  ,  89.6181   ,   0.5660294,
          0.8596874,         nan,         nan],
       [110.       ,   7.727924 ,   7.116364 ,  90.45003  ,   0.5366358,
          0.8887361,         nan,         nan]])
Sign up to request clarification or add additional context in comments.

1 Comment

Hi yatu, it works fine until I use the second set of data with the nan's.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.