3,396 questions
25
votes
2
answers
4k
views
Does excessive use of [[likely]] and [[unlikely]] really degrade program performance in C++?
The C++ standard [dcl.attr.likelihood] says:
[Note 2: Excessive usage of either of these attributes is liable to result in performance degradation.
— end note]
I’m trying to understand what “...
2
votes
0
answers
87
views
Clang __builtin_constant_p Inconsistent Behavior Issue Report [closed]
Problem Description
In Clang 21.1.0, the __builtin_constant_p builtin function exhibits inconsistent behavior when using constant arrays (.rodata) versus stack arrays. The function returns true in ...
3
votes
0
answers
105
views
How well can clang 20 infer the likelihood of branches without annotations?
I have a performance-critical C++ code base, and I want to improve (or at least measure if it's worth improving) the likelihood that clang assigns to branches, and in general understand what it's ...
0
votes
0
answers
39
views
How to build a gcc_tree_node from custom language Nodes
Nodes:
building a gcc_tree_node for a custom prograimming language
compile and base on C++26
the modules are avilable
the language using tab-block system
every keyword start with '/'
I want to ...
1
vote
1
answer
130
views
Can the compiler elide a const local copy of const& vector parameter?
Consider these two functions:
int foo(std::array<int, 10> const& v) {
auto const w = v;
int s{};
for (int i = 0; i < v.size(); ++i) {
s += (i % 2 == 0 ? v : w)[i];
...
5
votes
3
answers
268
views
How to make the optimiser treat a local function as a black box and not optimise based on its implementation?
I thought that the noinline function attribute would force the compiler to treat a local function as a black box:
__attribute__((noinline)) void touch_noinline(int&) {}
void touch_external(int&...
0
votes
1
answer
64
views
Why does MSVC AVX2 /FP:strict sometimes generate inferior (slower) code to SSE2?
I was testing various expressions of a sixth order polynomial to find the fastest possible throughput. I have stumbled upon a simple polynomial expression length 6 that provokes poor code generation ...
2
votes
0
answers
61
views
Why is sequential indexing with fixed length stride slower in Estrin's method?
Preparing to make Estrin's method vectorisable I changed from normal linear indexing of the coefficients to bitreversed and restricted it to strictly powers of 2. Neither MSVC nor ICX can see how to ...
1
vote
1
answer
117
views
How is rust able to optimize Option::is_some_and so effectively? [closed]
Looking at the codegen of a check inside for-loop I wanted to see if there is an optimization opportunity by outlining is_some_and but both cases had the same codegen.
struct V {
len: Option<...
5
votes
1
answer
182
views
Why are [[no_unique_address]] members not transparently replaceable?
In the classic talk An (In-)Complete Guide to C++ Object Lifetimes by Jonathan Müller, there is a useful guideline as follows:
Q: When do I need to use std::launder?
A: When you want to re-use the ...
7
votes
1
answer
317
views
Does GCC optimize array access with __int128 indexes incorrectly?
When compiling the following code using GCC 9.3.0 with O2 optimization enabled and running it on Ubuntu 20.04 LTS, x86_64 architecture, unexpected output occurs.
#include <algorithm>
#include &...
1
vote
2
answers
188
views
GCC switch statements do not simplify on identical handling
The switch statements in the following two functions
int foo(int value) {
switch (value) {
case 0:
return 0;
case 1:
return 0;
case 2:
return 1;
}
}
int ...
1
vote
0
answers
99
views
Does critical section protected by semaphore, mutex, etc, implicitly volatile? [duplicate]
Say if I have an array of integers, int array[NUM_ELEMENTS];, access to it is encapsulated as setter and getter function well protected by synchronization such as semaphore, mutex, etc, do I need to ...
4
votes
1
answer
151
views
optimize computation of real part of complex product
I need (only) the real part of the product of two complex numbers. Naturally, I can code this as
real(x)*real(y) - imag(x)*imag(y);
or
real(x*y);
The latter, however, formally first computes the ...
29
votes
1
answer
4k
views
Why do C compilers still prefer push over mov for saving registers, even when mov appears faster in llvm-mca?
I noticed that modern C compilers typically use push instructions to save caller-saved registers, rather than explicit mov + sub sequences. However, based on llvm-mca simulations, the mov approach ...