Integer overflow and undefined behavior

Question

There are many questions about detection of the integer overflow BEFORE the actual addition/substraction because of possible undefined behavior. So, my question is

Why it will produce this undefined behavior in the first place?

I can think of 2 causes:

1) A processor that generates exception in this case. Sure, it can be toggled off, and most probably a well written CRT will do that.

2) A processor that uses other binary representations of numbers (1's complement? base 10?). In that case the undefined behavior will manifest itself as different result (but will not crash!). Well, we could live with that.

So, why should someone avoid causing it? Am I missing something?

Mandatory link: blog.regehr.org/archives/213

Pascal Cuoq
– Pascal Cuoq

2010-10-16 17:41:58 +00:00
Commented Oct 16, 2010 at 17:41 — Pascal Cuoq
– Pascal Cuoq, Commented Oct 16, 2010 at 17:41
related: stackoverflow.com/questions/18195715/…

Ciro Santilli OurBigBook.com
– Ciro Santilli OurBigBook.com

2016-05-30 13:05:42 +00:00
Commented May 30, 2016 at 13:05 — Ciro Santilli OurBigBook.com
– Ciro Santilli OurBigBook.com, Commented May 30, 2016 at 13:05

R.. GitHub STOP HELPING ICE · Accepted Answer · 2010-10-16 16:28:39Z

While the historical reason signed overflow was specified as undefined behavior was probably these bogus legacy representations (ones complement/sign-magnitude) and overflow interrupts, the modern reason for it to remain undefined behavior is optimization. As J-16 SDiZ hinted at, the fact that signed overflow is undefined behavior allows the compiler to optimize out some conditionals whose algebraic truth (but not necessarily representation-level truth) are already established by a previous branch. It may also allow the compiler to algebraically simplify some expressions (especially those involving multiplication or division) in ways that could give different results than the originally-written order of evaluation if a subexpression contains an overflow, since the compiler is allowed to assume that overflow does not happen with the operands you've given it.

The other huge example of undefined behavior for the purpose of permitting optimization is the aliasing rules.

One could argue that unsigned integers should behave consistently, because why not assume that a + b >= a and a + b >= b for unsigned integers in the optimizer (similar to concluding a + b > 0 for a > 0 and b > 0 in J-16 SDiZ's example for signed integers)? In my eyes this is more of an oddity in the C spec.
@nucleon: The unsigned types are specified as modular arithmetic (where those algebraic equivalences of inequalities do not hold). The signed types are not. If you like to call this an oddity in C, so be it, but that's how it is and it's not really something you can change.

Paul R · Accepted Answer · 2010-10-16 10:31:35Z

12

Although most modern CPUs use 2's complement, and integer overflow results in predictable modulo wraparound, this is by no means universal - to keep the language sufficiently general that it can be used on the widest range of architectures it's better to specify that integer overflow is UB.

answered Oct 16, 2010 at 10:31

Paul R

214k38 gold badges402 silver badges579 bronze badges

5 Comments

R.. GitHub STOP HELPING ICE Over a year ago

Any cpu is twos complement if your compiler simply omits generating the useless signed arithmetic opcodes and uses the unsigned ones for both signed and unsigned types. This may incur minor performance losses when making comparisons (for instance, now x<-1 requires 2 machine-level comparisons instead of just 1), but I for one would still prefer it that way.

Paul R Over a year ago

@R: 2's complement isn't the only issue though - e.g. some CPUs (DSPs, for example) have saturating arithmetic rather then modulo arithmetic.

R.. GitHub STOP HELPING ICE Over a year ago

Modular arithmetic is required for supporting unsigned types anyway... if your machine doesn't have it I guess you have to implement it. One approach might be using the top bit as padding and zeroing it after each operation.

endolith Over a year ago

It's not just different machines; it's different optimization levels within the same compiler on the same machine: kuliniewicz.org/blog/archives/2011/06/11/…

Clearer Over a year ago

@R.. You could keep a sign bit. You don't have to use modulo arithmetic to support signed or unsigned values -- personally, I don't use modulo arithmetic when I do my own calculations and I do support unsigned calculations.

J-16 SDiZ · Accepted Answer · 2010-10-16 10:24:55Z

5

The undefined behavior bits in the specification involve some compiler optimization. For example:

if (a > 0 && b > 0) {
    if ( a + b <= 0 ) {
       // this branch may be optimized out by compiler
    } else {
       // this branch will always run
    }
}

Modern C compilers are not that simple, it do lots of guessing and optimization.

answered Oct 16, 2010 at 10:24

J-16 SDiZ

27k4 gold badges67 silver badges85 bronze badges

5 Comments

ruslik Over a year ago

Good point, but this only means that the C compiler won't help you detect it.

ruslik Over a year ago

another point: while compiler would be able to detect this arithmetic condition, some bit operations would be able to fool it so that the condition won't be optimized out. Also, as suggested by nategoose, we could just declare the sum as volatile.

R.. GitHub STOP HELPING ICE Over a year ago

This is not a valid use of the volatile keyword. If it works, it's a result of a specific implementation, not specified behavior. I.e. it's still undefined behavior.

supercat Over a year ago

One could achieve such optimizations without requiring integer overflow to be Undefined Behavior, if one specifies that the result of an operation yielding a value outside the range of int may yield any mathematical integer which would be congruent with the arithmetically-correct result, mod (2*INT_MAX+2), and that if such a value is stored to an int variable any read of the variable may, independently, likewise behave as any arbitrary mathematical integer in the same congruence class. Such a rule would allow a lot of useful optimizations, and also fit a common execution model...

supercat Over a year ago

...where values may be held in registers longer than int. Indeed, if I had my druthers, I would define a standard term for such values "Partially-Indeterminate" and specify that an implementation could define the behavior of out-of-range signed-integer downcast as yielding a Partially-Indeterminate Value. Otherwise types like uint32_t are apt to yield UB when used with machines whose int size isn't as expected.

Jens Gustedt · Accepted Answer · 2010-10-16 12:08:30Z

0

I think your assumption 1) that this can be switched off for any given processor has been false on at least one important historical architecture, the CDC, if my memory is correct.

answered Oct 16, 2010 at 12:08

Jens Gustedt

79.3k9 gold badges110 silver badges185 bronze badges

5 Comments

ruslik Over a year ago

This one is really old. I'd rather handle these cases with #ifdef .

Jens Gustedt Over a year ago

@ruslik: The question is not how to handle this, #ifdef is just one way among other to avoid UB. But you see, as you say yourself, you would handle this somehow.

ruslik Over a year ago

backwards compatibility has its limits. In this case I'd rather prefer to write buggy, but simple code.

Jens Gustedt Over a year ago

@ruslik: sure your mileage may vary. Certainly depends if you write your code for yourself or if you expect it to end up in a library or so. I find all this fuss that we have gone through (and still go) with 32 vs. 64 bit portability a real shame for the whole profession. And we will see this again for the next step. So, no, I for myself I try to stick to the standards.

R.. GitHub STOP HELPING ICE Over a year ago

The assumption that trapping can be switched off is correct, because, as the implementor of the compiler, you can simply choose never to generate signed-arithmetic opcodes and always use the clean/safe unsigned ones. Then you get twos complement for free at the same time.

Collectives™ on Stack Overflow

Integer overflow and undefined behavior

4 Answers 4

2 Comments

5 Comments

5 Comments

5 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

2 Comments

5 Comments

5 Comments

5 Comments

Linked

Related