When compiling the following code using GCC 9.3.0 with O2 optimization enabled and running it on Ubuntu 20.04 LTS, x86_64 architecture, unexpected output occurs.
#include <algorithm>
#include <iostream>
int c[2];
void f(__int128 p) {
    c[p + 1] = 1;
    c[p + 1] = std::max(c[p + 1], c[p] + 1);
    return;
}
int main() {
    f(0);
    std::cout << c[1] << std::endl;
    return 0;
}
It is supposed to output 1, but actually it outputs 0. I reviewed the code and did not find anything wrong, such as undefined behavior. It seems that the compiler gives an incorrect program.
I changed the optimization option and found that using -O2, -O3, or -Ofast will incorrectly output 0, while using -O or -O0 will give the correct output of 1.
If I change __int128 to another type (__uint128_t, int, or any other integral type), it will output 1 correctly.
I added __attribute__((noinline)) to void f(__int128 p), which does not change the output, and then I checked the assembly on Godbolt, and found that if c[p] + 1 > 1, function f assigns c[p] + 1 to c[p + 1], otherwise it does nothing, which is inconsistent with the code semantics.
I tried other versions of gcc on Godbolt, and all versions of x86-64 gcc from 9.0 to 13.2 with -O2 enabled give the incorrect output, but older or newer gcc or other compilers such as clang give the correct one.
I checked the list of problem reports that are known to be fixed in the gcc 13.3 release, yet I could not identify any of them with it.
I wonder if it is due to the compiler not working correctly or if there is a problem with the code. It is really confusing.
I observed that if changing std::max(c[p + 1], c[p] + 1) to std::max(c[p + 1], 1), it fails until gcc 11.2, which differs from the previous case. It suggests that gcc 11.3 may have partly fixed the issue, and I am still uncertain whether gcc 13.3 has fully resolved it.
int c[2]={};change somethingchas static storage duration - it should be zero-initialized as is.void f(int* c, __int128 p)(and pass in the array), it works, but if I change it tovoid f(int(&c)[2], __int128 p), it incorrectly optimises again. Seems to be to do with some range assumptions on p with the array bound__int128fixes the issue.