462

What is the simplest way to compare two NumPy arrays for equality (where equality is defined as: A = B iff for all indices i: A[i] == B[i])?

Simply using == gives me a boolean array:

 >>> numpy.array([1,1,1]) == numpy.array([1,1,1])

array([ True,  True,  True], dtype=bool)

Do I have to and the elements of this array to determine if the arrays are equal, or is there a simpler way to compare?

9 Answers 9

682
(A==B).all()

test if all values of array (A==B) are True.

Note: maybe you also want to test A and B shape, such as A.shape == B.shape

Special cases and alternatives (from dbaupp's answer and yoavram's comment)

It should be noted that:

  • this solution can have a strange behavior in a particular case: if either A or B is empty and the other one contains a single element, then it return True. For some reason, the comparison A==B returns an empty array, for which the all operator returns True.
  • Another risk is if A and B don't have the same shape and aren't broadcastable, then this approach will raise an error.

In conclusion, if you have a doubt about A and B shape or simply want to be safe: use one of the specialized functions:

np.array_equal(A,B)  # test if same shape, same elements values
np.array_equiv(A,B)  # test if broadcastable shape, same elements values
np.allclose(A,B,...) # test if same shape, elements have close enough values
Sign up to request clarification or add additional context in comments.

8 Comments

You've got a good point, but in the case I have a doubt on the shape I usually prefer to directly test it, before the value. Then the error is clearly on the shapes which have a completely different meaning than having different values. But that probably depends on each use-case
another risk is if the arrays contains nan. In that case you will get False because nan != nan
Good to point it out. However, I think this is logical because nan!=nan implies that array(nan)!=array(nan).
I do not understand this behavior: import numpy as np H = 1/np.sqrt(2)*np.array([[1, 1], [1, -1]]) #hadamard matrix np.array_equal(H.dot(H.T.conj()), np.eye(len(H))) # checking if H is an unitary matrix or not H is an unitary matrix, so H x H.T.conj is an identity matrix. But np.array_equal returns False
This should be put in another question. But the answer is simple, the values of the H * H.T.conj is close but not equals to an id matrix due to numeric precision: use np.allclose
|
139

The (A==B).all() solution is very neat, but there are some built-in functions for this task. Namely array_equal, allclose and array_equiv.

(Although, some quick testing with timeit seems to indicate that the (A==B).all() method is the fastest, which is a little peculiar, given it has to allocate a whole new array.)

5 Comments

you're right, except that if one of the compared arrays is empty you'll get the wrong answer with (A==B).all(). For example, try: (np.array([1])==np.array([])).all(), it gives True, while np.array_equal(np.array([1]), np.array([])) gives False
I just discovered this performance difference too. It's strange because if you have 2 arrays that are completely different (a==b).all() is still faster than np.array_equal(a, b) (which could have just checked a single element and exited).
np.array_equal also works with lists of arrays and dicts of arrays. This might be a reason for a slower performance.
Thanks a lot for the function allclose, that is what I needed for numerical calculations. It compares the equality of vectors within a tolerance. :)
Note that np.array_equiv([1,1,1], 1) is True. This is because: Shape consistent means they are either the same shape, or one input array can be broadcasted to create the same shape as the other one.
26

If you want to check if two arrays have the same shape AND elements you should use np.array_equal as it is the method recommended in the documentation.

Performance-wise don't expect that any equality check will beat another, as there is not much room to optimize comparing two elements. Just for the sake, i still did some tests.

import numpy as np
import timeit

A = np.zeros((300, 300, 3))
B = np.zeros((300, 300, 3))
C = np.ones((300, 300, 3))

timeit.timeit(stmt='(A==B).all()', setup='from __main__ import A, B', number=10**5)
timeit.timeit(stmt='np.array_equal(A, B)', setup='from __main__ import A, B, np', number=10**5)
timeit.timeit(stmt='np.array_equiv(A, B)', setup='from __main__ import A, B, np', number=10**5)
> 51.5094
> 52.555
> 52.761

So pretty much equal, no need to talk about the speed.

The (A==B).all() behaves pretty much as the following code snippet:

x = [1,2,3]
y = [1,2,3]
print all([x[i]==y[i] for i in range(len(x))])
> True

Comments

18

Let's measure the performance by using the following piece of code.

import numpy as np
import time

exec_time0 = []
exec_time1 = []
exec_time2 = []

sizeOfArray = 5000
numOfIterations = 200

for i in xrange(numOfIterations):

    A = np.random.randint(0,255,(sizeOfArray,sizeOfArray))
    B = np.random.randint(0,255,(sizeOfArray,sizeOfArray))

    a = time.clock() 
    res = (A==B).all()
    b = time.clock()
    exec_time0.append( b - a )

    a = time.clock() 
    res = np.array_equal(A,B)
    b = time.clock()
    exec_time1.append( b - a )

    a = time.clock() 
    res = np.array_equiv(A,B)
    b = time.clock()
    exec_time2.append( b - a )

print 'Method: (A==B).all(),       ', np.mean(exec_time0)
print 'Method: np.array_equal(A,B),', np.mean(exec_time1)
print 'Method: np.array_equiv(A,B),', np.mean(exec_time2)

Output

Method: (A==B).all(),        0.03031857
Method: np.array_equal(A,B), 0.030025185
Method: np.array_equiv(A,B), 0.030141515

According to the results above, the numpy methods seem to be faster than the combination of the == operator and the all() method and by comparing the numpy methods the fastest one seems to be the numpy.array_equal method.

3 Comments

You should use a larger array size that takes at least a second to compile to increase the experiment accuracy.
Does this also reproduce when order of comparison is changed? or reiniting A and B to random each time? This difference might also be explained from memory caching of A and B cells.
There's no meaningful difference between these timings.
10

Usually two arrays will have some small numeric errors,

You can use numpy.allclose(A,B), instead of (A==B).all(). This returns a bool True/False

Comments

5

Now use np.array_equal. From documentation:

np.array_equal([1, 2], [1, 2])
True
np.array_equal(np.array([1, 2]), np.array([1, 2]))
True
np.array_equal([1, 2], [1, 2, 3])
False
np.array_equal([1, 2], [1, 4])
False

1 Comment

1

On top of the other answers, you can now use an assertion:

numpy.testing.assert_array_equal(x, y)

You also have similar function such as numpy.testing.assert_almost_equal()

https://numpy.org/doc/stable/reference/generated/numpy.testing.assert_array_equal.html

Comments

0

Just for the sake of completeness. I will add the pandas approach for comparing two arrays:

import pandas as pd
import numpy as np
a = np.arange(0.0, 10.2, 0.12)
b = np.arange(0.0, 10.2, 0.12)
ap = pd.DataFrame(a)
bp = pd.DataFrame(b)

ap.equals(bp)
True

FYI: In case you are looking of How to compare Vectors, Arrays or Dataframes in R. You just you can use:

identical(iris1, iris2)
#[1] TRUE
all.equal(array1, array2)
#> [1] TRUE 

1 Comment

what is the pd. then? You should mention you're adding additional dependency of Pandas.
0
A=np.array([1,2,3,4])
B=np.array([1,2,3,4])

sum(A!=B)==0

The idea is that the total number of unequal elements should be zero.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.