As data is distributed between 0 and 255 when the array is sorted, around the first half of the iterations will not enter the if-statement (the if statement is shared below).
if (data[c] >= 128)
sum += data[c];
The question is: What makes the above statement not execute in certain cases as in case of sorted data? Here comes the "branch predictor". A branch predictor is a digital circuit that tries to guess which way a branch (e.g. an if-then-else structure) will go before this is known for sure. The purpose of the branch predictor is to improve the flow in the instruction pipeline. Branch predictors play a critical role in achieving high effective performance!
Let's do some bench marking to understand it better
The performance of an if-statement depends on whether its condition has a predictable pattern. If the condition is always true or always false, the branch prediction logic in the processor will pick up the pattern. On the other hand, if the pattern is unpredictable, the if-statement will be much more expensive.
Let’s measure the performance of this loop with different conditions:
for (int i = 0; i < max; i++)
if (condition)
sum++;
Here are the timings of the loop with different true-false patterns:
Condition Pattern Pattern Time (ms)
-------------------------------------------------------
(i & 0×80000000) == 0 T repeated 322
(i & 0xffffffff) == 0 F repeated 276
(i & 1) == 0 TF alternating 760
(i & 3) == 0 TFFFTFFF… 513
(i & 2) == 0 TTFFTTFF… 1675
(i & 4) == 0 TTTTFFFFTTTTFFFF… 1275
(i & 8) == 0 8T 8F 8T 8F … 752
(i & 16) == 0 16T 16F 16T 16F … 490
A “bad” true-false pattern can make an if-statement up to six times slower than a “good” pattern! Of course, which pattern is good and which is bad depends on the exact instructions generated by the compiler and on the specific processor.
So there is no doubt about the impact of branch prediction on performance!