XOR pair frequency queries

Question

We are given an array of length $N$ and $Q$ queries (offline) where each query is a value $K$, for each query we need to count number of pairs in array with XOR $K$.

If $N$ and $Q$ can both be upto $10^5$, is it possible to answer these queries better than traversing entire array for each query?

An analogous version to count pairs with given sum uses FFT to precompute frequency every possible sum, which seems very difficult to do for XOR operation.

We can assume values in array are upto $10^5$ too, but if a solution exists which works for larger numbers too, that's preferable.

For Your array $a$, you want to count the index pairs, $(i,j)$ for which $(a_i \operatorname{xor} a_j) = k$, is that correct? — DirkT
– DirkT, Commented Aug 17, 2023 at 8:23
en.wikipedia.org/wiki/Discrete_Fourier_transform_over_a_ring, en.wikipedia.org/wiki/Cyclotomic_fast_Fourier_transform — D.W.
– D.W. ♦, Commented Aug 17, 2023 at 16:14
The xor is in the exponent, not the coefficients, so I don't see how FFT could be useful — Command Master
– Command Master, Commented Aug 20, 2023 at 11:31

DirkT · Accepted Answer · 2023-08-18 08:11:17Z

(The following sample code is written in Scala, but is hopefully understandable without Scala knowledge.)

If I understand Your question correctly You are looking for a faster alternative to the following $\mathcal{O}(n²)$ function:

def countXorPairsV1( a: Array[Int], k: Int ): Long =
  var n = 0

  for ai <- a do
    for aj <- a do
      if (ai^aj) == k then n += 1
    end for
  end for

  return n
end countXorPairsV1

Radix Splits

The first optimization that You can do, is to go through the bits of Your integer representation one by one. For each bit index $b$ you recursively split Your array $a$ into two parts:

$$ low_b(a) = \left\{\; x \in a \; | \; bit_b(x) = 0 \;\right\} $$ $$ high_b(a) = \left\{\; x \in a \; | \; bit_b(x) = 1 \;\right\} $$

This is very reminiscent of the splitting performed by radix sort. Depending on the value of $bit_b(k)$, You can deduct that only either low-high/high-low or low-low/high-high pairings can xor to $k$. In code it looks like this:

def countXorPairsV2( a: Array[Int], k: Int ): Long =

  def count( bit: Int, u: Array[Int], v: Array[Int] ): Long =

    if u.isEmpty || v.isEmpty then return 0
    if bit == 0 then return u.length * v.length.toLong

    def splitFn( x: Int ) = (x & bit) != 0

    val (u_lo,u_hi) = u.partition(splitFn)
    val (v_lo,v_hi) = v.partition(splitFn)

    val nxt = bit >>> 1 // <- next bit to be used

    if (k & bit) == 0
    then return count(nxt,u_lo,v_lo) + count(nxt,u_hi,v_hi)
    else return count(nxt,u_lo,v_hi) + count(nxt,u_hi,v_lo)

  end count

  return count(1<<31,a,a)

end countXorPairsV2

Each split requires $O(n)$ operations operations. If the integers in $a$ are uniformly distributed, each split should roughly partition $u$ and $v$ into halves. That results in the recursion:

$$T_{V2}[n] = n + 2 \cdot T_{V2}\left[\frac{n}{2}\right]$$

Which results in $\mathcal{O}(n\log{n})$ operations. But we can go even faster...

Binary Search

If we take some time to sort $a$ beforehand, we can perform splits in $\mathcal{O}(log(n))$ time using binary search:

def countXorPairsV3( a: Array[Int] ): Int => Long =

  val lookup = a.sorted(Integer.compareUnsigned)

  def count( k: Int ): Long =

    def recursion( bit: Int, l0: Int, l2: Int, r0: Int, r2: Int ): Long =

      if l0 >= l2 || r0 >= r2 then return 0
      if bit == 0 then return (l2-l0) * (r2-r0).toLong

      @tailrec def binarySearch( from: Int, until: Int ): Int =
        if from >= until then return from

        val mid = from+until >>> 1

        if (lookup(mid) & bit) == 0
        then return binarySearch(mid+1, until)
        else return binarySearch(from, mid)
      end binarySearch

      val l1 = binarySearch(l0,l2)
      val r1 = binarySearch(r0,r2)
      val nxt = bit >>> 1

      if (k & bit) == 0
      then return recursion(nxt, l0,l1, r0,r1) + recursion(nxt, l1,l2, r1,r2)
      else return recursion(nxt, l0,l1, r1,r2) + recursion(nxt, l1,l2, r0,r1)

    end recursion

    return recursion(1<<31, 0,a.length, 0,a.length)

  end count

  return count

end countXorPairsV3

This should result in $\mathcal{O}(n)$ operations per query derived from the recursion:

$$ T_{V3}[n] = \log(n) + 2 \cdot T_{V3}\left[\frac{n}{2}\right] $$

Further Optimization Potential

There is a lot more optimization potential, but I feel like I'v already gone beyond the scope of a simple answer. So I will only hint at what's possible:

If $a$ contains duplicates, You can group them together.
You can probably modify the binary search to perform a multi-bit split, in order to make sure that $u$ and $v$ are roughly split in half every time.
You can use a tree data structure to be able to split in $\mathcal{O}(1)$ operations.
...

An Interesting Observation

I have not checked let alone proved this, but I have the hypothesis that, as long $a$ a contains unique uniformly distributed values, there should be $\mathcal{O}(n)$ pairs that $\operatorname{xor}$ to $k$.

Isn't there a much simpler way to do this? It should be true that ai^aj == k when ai^k == aj so you can just check whether you can find each number's xor with k. If you first put the numbers in a hash table or some other data structure with O(1) inserts and queries you can do the whole thing in O(n) time. Duplicates need to be handled separately but it shouldn't be too difficult. — QuantumWiz
– QuantumWiz, Commented Sep 20, 2023 at 17:21

gnasher729 · Accepted Answer · 2023-08-20 09:53:10Z

0

A simple method: Calculate the array B = A xor k. Sort arrays A and B and count the common entries. Time is O(n log n) with n array elements.

This works because (x xor y) = k iff x = (y xor k). So it wouldn’t help for “find all pairs with a product equal to k”.

answered Aug 20, 2023 at 9:53

gnasher729

32.6k36 silver badges58 bronze badges

Add a comment |

gnasher729 · Accepted Answer · 2023-08-20 11:22:13Z

With your small numbers, you just create an array that counts how often each number turns up, say f[i] = number of array elements equal to I. Then for query K, you add up f[i] * f[i xor k] for all i. So if all values are less than M = 2^n, then you need max(M, N) steps to fill the array f, and Q * M steps for all queries, independent of N.

If N is small compared to M, you create a linked list of indices I with f[i] ≠ 0, then the queries take Q * N’ where N’ is the number of different values in the array A. You’d do actually do this if there are much fewer than M different values, even if N is large.

Since the naive method takes Q * N^2 steps, you’d use that if this is not much larger than M.

Command Master · Accepted Answer · 2023-08-20 12:38:04Z

There is a simple reduction to 3XOR when you have large numbers, so it's unlikely there is an efficient solution (faster than the $O(NQ)$ which can be achieved with a hash table).

In a bit more detail, if the array and the queries are the same, if any of the queries, say $c$, have a non-zero value then you know there's a solution to $a \oplus b = c$ which can then be found in $O(N)$ time.

In fact it even lets one count the number of solutions to 3XOR, so even if the numbers are small and randomized algorithms could find a solution to 3XOR efficiently (because either one exists WHP or there aren't many numbers) you likely can only get approximate answers.

Stack Exchange Network

XOR pair frequency queries

4 Answers 4

Radix Splits

Binary Search

Further Optimization Potential

An Interesting Observation

Hot Network Questions

XOR pair frequency queries

4 Answers 4

Radix Splits

Binary Search

Further Optimization Potential

An Interesting Observation

Related

Hot Network Questions