Fast absolute difference of two uint8 arrays

Question

I have two numpy-arrays with dtype=np.uint8 - like this:

img1=np.uint8(np.random.randint(0, 255, (480, 640)))
img2=np.uint8(np.random.randint(0, 255, (480, 640)))

And I want to build the positive difference of these arrays.

Here are my first two approches (and a third one for reference):

def differenceImageV1(img1, img2):
  diff=np.empty_like(img1)
  h, w=img1.shape
  for y in range(h):
    for x in range(w):
      if img1[y, x]<img2[y, x]: diff[y, x]=img2[y, x]-img1[y, x]
      else:                     diff[y, x]=img1[y, x]-img2[y, x]
  return(diff)

def differenceImageV2(img1, img2):
  return(np.uint8(np.absolute(np.int16(img1)-np.int16(img2))))

def differenceImageV3(img1, img2):  # fast - but wrong result
  return(img1-img2)

I get these execution times (and the sums to check, if they are equal):

  10x: 1893.54 ms  np.sum=26122208
1000x:  411.71 ms  np.sum=26122208
1000x:   26.60 ms  np.sum=39123624

Is there a way to get a correct result faster as with V2 ?

Not sure if it would really help that much, but you likely don't need the np.int16(img2) in differenceImageV2, and can just use img2. Also, are you using the timeit library for accurate timing results? — Alyssa Haroldsen
– Alyssa Haroldsen, Commented Mar 3, 2016 at 17:15
I'm aware of the overflow (underflow is when the difference between two floating point numbers is too small to become distinguishable). I mean the full statement would be return(np.uint8(np.absolute(np.int16(img1)-img2))). img1 is still casted to an int16, so the result will be an int16 and can allow the same negative numbers that doing the cast beforehand will do. np.sum gives the same result. It just doesn't take a copy of the entire array first. I shave off about 70 ms for 1000 loops. — Alyssa Haroldsen
– Alyssa Haroldsen, Commented Mar 3, 2016 at 17:36
you are right. I get the same result, if I do a np.int16(img1)-img2. The execution time goes down to 400.39 ms for 1000 loops. And no, I use time.process_time() here, because I do not care about one or ten milliseconds. — dede
– dede, Commented Mar 3, 2016 at 17:40
@dede: See my answer. Since I'm getting a 40x gap between V2 and V3 in my testing (way bigger than your 15x gap), I'd like to see how my solution performs on your testing setup. — nneonneo
– nneonneo, Commented Mar 3, 2016 at 18:06
@nneonneo: I get these values: V1( 10x: 1897.16 ms), V2(1000x: 412.73 ms), V3(1000x: 28.60 ms), V4(1000x: 1369.41 ms), V5(1000x: 253.97 ms), V6(1000x: 183.48 ms). Was this the data, you asked for? — dede
– dede, Commented Mar 3, 2016 at 18:40

nneonneo · Accepted Answer · 2016-03-03 18:03:24Z

Here's one approach that is significantly faster than V2: take img1-img2, and multiply by 1 or -1 depending on img1>img2. Here's how it is implemented:

def differenceImageV6(img1, img2):
  a = img1-img2
  b = np.uint8(img1<img2) * 254 + 1
  return a * b

A test harness for testing performance:

import numpy as np

img1=np.uint8(np.random.randint(0, 255, (480, 640)))
img2=np.uint8(np.random.randint(0, 255, (480, 640)))

def differenceImageV1(img1, img2):
  diff=np.empty_like(img1)
  h, w=img1.shape
  for y in range(h):
    for x in range(w):
      if img1[y, x]<img2[y, x]: diff[y, x]=img2[y, x]-img1[y, x]
      else:                     diff[y, x]=img1[y, x]-img2[y, x]
  return(diff)

def differenceImageV2(img1, img2):
  return(np.uint8(np.abs(np.int16(img1)-img2)))

def differenceImageV3(img1, img2):  # fast - but wrong result
  return(img1-img2)

def differenceImageV4(img1, img2):
  return np.where(img1>img2, img1-img2, img2-img1)

def differenceImageV5(img1, img2):
  a = img1-img2
  b = img2-img1
  c = img1>img2
  return a*c + b*(~c)

def differenceImageV6(img1, img2):
  a = img1-img2
  b = np.uint8(img1<img2) * 254 + 1
  return a * b

import timeit
def testit():
  for fn in [differenceImageV2, differenceImageV3, differenceImageV4, differenceImageV5, differenceImageV6]:
    print fn.__name__, np.sum(fn(img1, img2).astype('int64')),
    print timeit.timeit("%s(img1, img2)" % fn.__name__, "from test import img1, img2, %s" % fn.__name__, number=1000)

if __name__ == '__main__':
    testit()

and resulting performance numbers:

differenceImageV2 26071358 0.982538938522
differenceImageV3 39207702 0.0261280536652
differenceImageV4 26071358 1.36270809174
differenceImageV5 26071358 0.220561981201
differenceImageV6 26071358 0.154536962509

differenceImageV6 is about 6x slower than the incorrect differenceImageV3, but still about 6x faster than the previous best differenceImageV2. differenceImageV1 isn't tested because it's easily a few orders of magnitude slower than the rest.

Note: I included an np.where approach for comparison; I thought it might have good performance but it turns out to be fairly poor. It seems that performing slicing by a boolean array is quite slow in NumPy.

I'm getting about 895 µs for differenceImageV2, while 645 µs for differenceImageV6. Regardless of actual speedup, I still applaud the numpy wizardry.
I think V6 will be faster if you do it as a = img1 - img2; a[img1 < img2] *= 255.
@dede The most fundamental bits are that multiplying by 255 with 8-bit math is equivalent to multiplying by -1 and letting it overflow. a is the basic difference, and b is an array of either 1 or 255 based on if that element of img1 is less than that element of img2.
@Jaime While ingenious, it's even slower than differenceImageV2 on my machine.

Jan Rüegg · Accepted Answer · 2019-09-03 12:13:59Z

11

If you have opencv available, you can also use:

def differenceImageV4(img1, img2):
  return cv2.absdiff(img1, img2)

which is almost the same speed as differenceImageV3.

answered Sep 3, 2019 at 12:13

Jan Rüegg

10.1k8 gold badges66 silver badges112 bronze badges

Comments

user3814063 · Accepted Answer · 2023-11-19 05:22:03Z

Another version that's not quite as fast as V3 but faster than V6 in the accepted answer (at least in python 3.11.6/numpy 1.26.0 on my machine):

def differenceImageV7(img1, img2):
    return np.maximum(img1, img2) - np.minimum(img1, img2)

Using the test from the accepted answer (after trivial corrections for python3) and also including the cv2 answer for comparison:

differenceImageV2  26091624 0.10563181596808136
differenceImageV3  39184764 0.016355138970538974
differenceImageV4  26091624 1.115124913980253
differenceImageV5  26091624 0.14360517100431025
differenceImageV6  26091624 0.10052977490704507
differenceImageV7  26091624 0.05007318709976971
differenceImageCV2 26091624 0.017319782986305654

Worth noting: in the interim V2 also performs much better than it used to.

I've tested these for CUDA C++ kernels and found that your version (v7) is faster than others in the accepted answer

Alexander Soare · Accepted Answer · 2023-07-12 10:47:03Z

It's been a while since the accepted answer so just FYI I did some comparisons on Python3.10 with OpenCV 4.6.0 and Numpy 1.23.0. The OpenCV version is clearly superior, running about as fast as the wrong version (edit: just realized that's exactly what Jan Rüegg was pointing out, but I misread the numbering of the functions). Also, the int16 version is now about as fast as the neat trick.

from time import perf_counter

import cv2
import numpy as np


def wrong(img1, img2):  # fast - but wrong result
    return img1 - img2


def with_int16(img1, img2):
    return np.abs((img1.astype(np.int16) - img2.astype(np.int16))).astype(np.uint8)


def neat_trick(img1, img2):
    a = img1 - img2
    b = np.uint8(img1 < img2) * 254 + 1
    return a * b


def opencv(img1, img2):
    return cv2.absdiff(img1, img2)


def testit():
    for fn in [wrong, with_int16, neat_trick, opencv]:
        times = []
        np_random = np.random.RandomState(0)
        for _ in range(10000):
            img1 = np.uint8(np_random.randint(0, 255, (480, 640)))
            img2 = np.uint8(np_random.randint(0, 255, (480, 640)))
            start = perf_counter()
            fn(img1, img2)
            times.append(perf_counter() - start)
        print(f"{fn.__name__.rjust(10)}: [mean]: {np.mean(times):.3E}, [std] {np.std(times):.3E}")


if __name__ == "__main__":
    testit()

Output:

     wrong: [mean]: 9.198E-06, [std] 1.765E-06
with_int16: [mean]: 6.072E-05, [std] 4.590E-06
neat_trick: [mean]: 5.746E-05, [std] 5.453E-06
    opencv: [mean]: 9.660E-06, [std] 1.589E-06

Collectives™ on Stack Overflow

Fast absolute difference of two uint8 arrays

4 Answers 4

8 Comments

Comments

1 Comment

Comments

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

8 Comments

Comments

1 Comment

Comments

Related