Consider a LALR(1) parser for a file format that allows integer numbers and floating point numbers.
As usual, something like 42 shall be a valid integer and a valid float (with some automagic conversion in the background).
There might be parsing rules where a floating point number or an integer number is expected, and other rules where only an integer number is expected, e.g.:
foo1
: bar FLOAT buzz
| bar INT buzz
;
foo2
: some INT other stuff
;
Now consider something like
foo3
: bar FLOAT xyz FLOAT abc FLOAT buzz
;
but at each position in this rule, instead of FLOAT, also INT shall be allowed.
Turning this rule into 8 rules (one rule for each combination of
FLOATandINT) isn’t an option. (Consider a rule having 4 or 5 numbers...)Using a rule like
float_or_int : FLOAT | INT;won’t help, because in general, this rule will reduce all
INTtofloat_or_int, and rules likefoo2no longer can be parsed. (Because with a grammar large enough, the one token lookahead cannot avoid the shift-reduce-conflicts resulting from this rule.)When the lexer sees a number without a decimal point, it cannot decide whether the parser currently expects an int or a float-or-int.
How can this be handled in an elegant way?
foo1andfoo2rules from above, add amain : foo1 | foo2;, and add rulesbar : INT;,buzz : INT;etc. That’s of course (very) artificial, but in my actual project, which is larger and which I definitely cannot post here, such a rule will create a lot of shift-reduce conflicts. And I already had the same with at least two other parsers. It (of course) depends on the tokens that can followINTorFLOAT, and if you design a language from scratch, you can take that into account. But when parsing some existing stuff, I made the experience there will be conflicts.float_or_intrule, but the file format to be parsed doesn’t allow it and I cannot change the file format.