Skip to main content
25 votes
2 answers
4k views

Does excessive use of [[likely]] and [[unlikely]] really degrade program performance in C++?

The C++ standard [dcl.attr.likelihood] says: [Note 2: Excessive usage of either of these attributes is liable to result in performance degradation. — end note] I’m trying to understand what “...
Artyom Fedosov's user avatar
2 votes
0 answers
87 views

Clang __builtin_constant_p Inconsistent Behavior Issue Report [closed]

Problem Description In Clang 21.1.0, the __builtin_constant_p builtin function exhibits inconsistent behavior when using constant arrays (.rodata) versus stack arrays. The function returns true in ...
Moi5t's user avatar
  • 493
3 votes
0 answers
105 views

How well can clang 20 infer the likelihood of branches without annotations?

I have a performance-critical C++ code base, and I want to improve (or at least measure if it's worth improving) the likelihood that clang assigns to branches, and in general understand what it's ...
meisel's user avatar
  • 2,593
0 votes
0 answers
39 views

How to build a gcc_tree_node from custom language Nodes

Nodes: building a gcc_tree_node for a custom prograimming language compile and base on C++26 the modules are avilable the language using tab-block system every keyword start with '/' I want to ...
Adam Bekoudj's user avatar
1 vote
1 answer
130 views

Can the compiler elide a const local copy of const& vector parameter?

Consider these two functions: int foo(std::array<int, 10> const& v) { auto const w = v; int s{}; for (int i = 0; i < v.size(); ++i) { s += (i % 2 == 0 ? v : w)[i]; ...
Enlico's user avatar
  • 30.1k
5 votes
3 answers
268 views

How to make the optimiser treat a local function as a black box and not optimise based on its implementation?

I thought that the noinline function attribute would force the compiler to treat a local function as a black box: __attribute__((noinline)) void touch_noinline(int&) {} void touch_external(int&...
sh1's user avatar
  • 4,990
0 votes
1 answer
64 views

Why does MSVC AVX2 /FP:strict sometimes generate inferior (slower) code to SSE2?

I was testing various expressions of a sixth order polynomial to find the fastest possible throughput. I have stumbled upon a simple polynomial expression length 6 that provokes poor code generation ...
Martin Brown's user avatar
  • 3,586
2 votes
0 answers
61 views

Why is sequential indexing with fixed length stride slower in Estrin's method?

Preparing to make Estrin's method vectorisable I changed from normal linear indexing of the coefficients to bitreversed and restricted it to strictly powers of 2. Neither MSVC nor ICX can see how to ...
Martin Brown's user avatar
  • 3,586
1 vote
1 answer
117 views

How is rust able to optimize Option::is_some_and so effectively? [closed]

Looking at the codegen of a check inside for-loop I wanted to see if there is an optimization opportunity by outlining is_some_and but both cases had the same codegen. struct V { len: Option<...
A. K.'s user avatar
  • 39.2k
5 votes
1 answer
182 views

Why are [[no_unique_address]] members not transparently replaceable?

In the classic talk An (In-)Complete Guide to C++ Object Lifetimes by Jonathan Müller, there is a useful guideline as follows: Q: When do I need to use std::launder? A: When you want to re-use the ...
xmllmx's user avatar
  • 44.5k
7 votes
1 answer
317 views

Does GCC optimize array access with __int128 indexes incorrectly?

When compiling the following code using GCC 9.3.0 with O2 optimization enabled and running it on Ubuntu 20.04 LTS, x86_64 architecture, unexpected output occurs. #include <algorithm> #include &...
mzd6id99's user avatar
1 vote
2 answers
188 views

GCC switch statements do not simplify on identical handling

The switch statements in the following two functions int foo(int value) { switch (value) { case 0: return 0; case 1: return 0; case 2: return 1; } } int ...
notgapriel's user avatar
1 vote
0 answers
99 views

Does critical section protected by semaphore, mutex, etc, implicitly volatile? [duplicate]

Say if I have an array of integers, int array[NUM_ELEMENTS];, access to it is encapsulated as setter and getter function well protected by synchronization such as semaphore, mutex, etc, do I need to ...
PkDrew's user avatar
  • 2,659
4 votes
1 answer
151 views

optimize computation of real part of complex product

I need (only) the real part of the product of two complex numbers. Naturally, I can code this as real(x)*real(y) - imag(x)*imag(y); or real(x*y); The latter, however, formally first computes the ...
Walter's user avatar
  • 45.8k
29 votes
1 answer
4k views

Why do C compilers still prefer push over mov for saving registers, even when mov appears faster in llvm-mca?

I noticed that modern C compilers typically use push instructions to save caller-saved registers, rather than explicit mov + sub sequences. However, based on llvm-mca simulations, the mov approach ...
Moi5t's user avatar
  • 493

15 30 50 per page
1
2 3 4 5
227