Newest 'compiler-optimization' Questions

25 votes

2 answers

4k views

Does excessive use of [[likely]] and [[unlikely]] really degrade program performance in C++?

The C++ standard [dcl.attr.likelihood] says: [Note 2: Excessive usage of either of these attributes is liable to result in performance degradation. — end note] I’m trying to understand what “...

Artyom Fedosov

511

asked Oct 7 at 17:51

2 votes

0 answers

87 views

Clang __builtin_constant_p Inconsistent Behavior Issue Report [closed]

Problem Description In Clang 21.1.0, the __builtin_constant_p builtin function exhibits inconsistent behavior when using constant arrays (.rodata) versus stack arrays. The function returns true in ...

Moi5t

493

asked Oct 3 at 3:30

3 votes

0 answers

105 views

How well can clang 20 infer the likelihood of branches without annotations?

I have a performance-critical C++ code base, and I want to improve (or at least measure if it's worth improving) the likelihood that clang assigns to branches, and in general understand what it's ...

meisel

2,593

asked Sep 26 at 16:14

0 votes

0 answers

39 views

How to build a gcc_tree_node from custom language Nodes

Nodes: building a gcc_tree_node for a custom prograimming language compile and base on C++26 the modules are avilable the language using tab-block system every keyword start with '/' I want to ...

Adam Bekoudj

1

asked Sep 20 at 17:41

1 vote

1 answer

130 views

Can the compiler elide a const local copy of const& vector parameter?

Consider these two functions: int foo(std::array<int, 10> const& v) { auto const w = v; int s{}; for (int i = 0; i < v.size(); ++i) { s += (i % 2 == 0 ? v : w)[i]; ...

Enlico

30.1k

asked Sep 18 at 8:49

5 votes

3 answers

268 views

How to make the optimiser treat a local function as a black box and not optimise based on its implementation?

I thought that the noinline function attribute would force the compiler to treat a local function as a black box: __attribute__((noinline)) void touch_noinline(int&) {} void touch_external(int&...

sh1

4,990

asked Sep 15 at 5:06

0 votes

1 answer

64 views

Why does MSVC AVX2 /FP:strict sometimes generate inferior (slower) code to SSE2?

I was testing various expressions of a sixth order polynomial to find the fastest possible throughput. I have stumbled upon a simple polynomial expression length 6 that provokes poor code generation ...

Martin Brown

3,586

asked Sep 12 at 16:51

2 votes

0 answers

61 views

Why is sequential indexing with fixed length stride slower in Estrin's method?

Preparing to make Estrin's method vectorisable I changed from normal linear indexing of the coefficients to bitreversed and restricted it to strictly powers of 2. Neither MSVC nor ICX can see how to ...

Martin Brown

3,586

asked Sep 1 at 17:29

1 vote

1 answer

117 views

How is rust able to optimize Option::is_some_and so effectively? [closed]

Looking at the codegen of a check inside for-loop I wanted to see if there is an optimization opportunity by outlining is_some_and but both cases had the same codegen. struct V { len: Option<...

A. K.

39.2k

asked Aug 27 at 20:43

5 votes

1 answer

182 views

Why are [[no_unique_address]] members not transparently replaceable?

In the classic talk An (In-)Complete Guide to C++ Object Lifetimes by Jonathan Müller, there is a useful guideline as follows: Q: When do I need to use std::launder? A: When you want to re-use the ...

xmllmx

44.5k

asked Aug 10 at 16:14

7 votes

1 answer

317 views

Does GCC optimize array access with __int128 indexes incorrectly?

When compiling the following code using GCC 9.3.0 with O2 optimization enabled and running it on Ubuntu 20.04 LTS, x86_64 architecture, unexpected output occurs. #include <algorithm> #include &...

mzd6id99

73

asked Aug 10 at 10:01

1 vote

2 answers

188 views

GCC switch statements do not simplify on identical handling

The switch statements in the following two functions int foo(int value) { switch (value) { case 0: return 0; case 1: return 0; case 2: return 1; } } int ...

notgapriel

125

asked Aug 5 at 5:42

1 vote

0 answers

99 views

Does critical section protected by semaphore, mutex, etc, implicitly volatile? [duplicate]

Say if I have an array of integers, int array[NUM_ELEMENTS];, access to it is encapsulated as setter and getter function well protected by synchronization such as semaphore, mutex, etc, do I need to ...

PkDrew

2,659

asked Jul 31 at 1:28

4 votes

1 answer

151 views

optimize computation of real part of complex product

I need (only) the real part of the product of two complex numbers. Naturally, I can code this as real(x)*real(y) - imag(x)*imag(y); or real(x*y); The latter, however, formally first computes the ...

Walter

45.8k

asked Jul 29 at 16:52

29 votes

1 answer

4k views

Why do C compilers still prefer push over mov for saving registers, even when mov appears faster in llvm-mca?

I noticed that modern C compilers typically use push instructions to save caller-saved registers, rather than explicit mov + sub sequences. However, based on llvm-mca simulations, the mov approach ...

Moi5t

493

asked Jul 29 at 4:12

Collectives™ on Stack Overflow

Does excessive use of [[likely]] and [[unlikely]] really degrade program performance in C++?

Clang __builtin_constant_p Inconsistent Behavior Issue Report [closed]

How well can clang 20 infer the likelihood of branches without annotations?

How to build a gcc_tree_node from custom language Nodes

Can the compiler elide a const local copy of const& vector parameter?

How to make the optimiser treat a local function as a black box and not optimise based on its implementation?

Why does MSVC AVX2 /FP:strict sometimes generate inferior (slower) code to SSE2?

Why is sequential indexing with fixed length stride slower in Estrin's method?

How is rust able to optimize Option::is_some_and so effectively? [closed]

Why are [[no_unique_address]] members not transparently replaceable?

Does GCC optimize array access with __int128 indexes incorrectly?

GCC switch statements do not simplify on identical handling

Does critical section protected by semaphore, mutex, etc, implicitly volatile? [duplicate]

optimize computation of real part of complex product

Why do C compilers still prefer push over mov for saving registers, even when mov appears faster in llvm-mca?

Hot Network Questions