0
$\begingroup$

I've done some very basic bench marking for the following:

#include <stddef.h>
#include <stdlib.h>

#define LOOP_CNT 1000000000

int main(void) {
        int a, b, c;

        for (size_t i = 0; i < LOOP_CNT; i++) {

#ifdef TMP
                c = a;
                a = b;
                b = c;
#endif

#ifdef XOR
                a ^= b;
                b ^= a;
                a ^= b;
#endif
        }

        return EXIT_SUCCESS;

}

Where I use -DTMP or -DXOR, -g, and -O0 flags with both gcc and icx.

Here are the results:

$ time ./swap_gcc_xor 
real    0m3.721s
user    0m3.721s
sys     0m0.000s

$ time ./swap_icx_xor                                                                                                        
real    0m0.974s
user    0m0.974s
sys     0m0.000s

$ time ./swap_gcc_tmp
real    0m0.889s
user    0m0.889s
sys     0m0.001s

$ time ./swap_icx_tmp                                                                                                       
real    0m0.836s
user    0m0.832s
sys     0m0.005s

Why is the TMP version generally faster? Both versions have data hazards, and a XOR instruction is trivial for the ALU.

I have self-studied CPU micro-architecture, so please answer technically.

$\endgroup$
3
  • 2
    $\begingroup$ To begin with, did you actually check the generated assembly to see what the machine thinks it is doing? $\endgroup$ Commented Oct 16 at 14:19
  • 2
    $\begingroup$ With the compiler optimization turned off, the code generated for each version of the test (tmp vs. xor) will be the most debugger friendly code, not the fastest code. As Emil pointed out, you really need to look at the assembly code generated by the compiler when no optimization is performed vs. that produced at the highest level of optimization. Your example code, when compiled at maximum optimization, may even result in no code at all, due to the fact that computations are performed but not used. $\endgroup$ Commented Oct 16 at 14:41
  • $\begingroup$ Please do not use an even number of swaps. $\endgroup$ Commented Oct 17 at 6:31

2 Answers 2

1
$\begingroup$

Every version of your code invokes undefined behaviour, therefore the compiler is free to produce any code it wants. No conclusions can be drawn.

$\endgroup$
0
$\begingroup$

It may be because we can extract ILP from instructions with temp variable as there are false dependencies. The one with xor has RAW dependencies.

$\endgroup$
1
  • $\begingroup$ It's not clear to me if this answers the question, or should be a comment. $\endgroup$ Commented Oct 23 at 1:47

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.