16
votes
Accepted
How would you test a lexer?
Your grammar probably has some rules for each token on how it can be produced (for example, that a { signifies a BLOCK_START token, or that a string-literal token is delimited by " characters). ...
16
votes
How would you test a lexer?
If you're writing the lexer yourself, this seems like an ideal case for test-driven development.
While “the number of combinations of tokens in a source file can be huge,” the number of branches in ...
15
votes
Rewrite or Transpiler - How to move away from a proprietary SAAS solution
Let's me take on your issues one-by-one:
the current implementation has bugs
Yep, and when you transpile such a code, what makes you think those bugs will not be transpiled as well? With a rewrite, ...
11
votes
Accepted
Should my lexer allow what is obviously a syntax error?
Your lexer is never going to be able to diagnose all syntax errors unless you make it as powerful as the parser itself. This would be a large and totally unnecessary amount of work, and the only ...
8
votes
What should be the datatype of the tokens a lexer returns to its parser?
As said in the title, which data type should a lexer return/give the parser?
"Token", obviously. A lexer produces a stream of tokens, so it should return a stream of tokens.
He mentioned Flex, a ...
7
votes
How would you test a lexer?
One alternative that others aren't mentioning, is to use a test generative approach—like QuickCheck from Haskell—to generate the edge cases from the grammar you've defined.
Now, once they're generated,...
6
votes
Rewrite or Transpiler - How to move away from a proprietary SAAS solution
Obviously you ask "rewrite or transpile?", but I'm not clear what the underlying issue is.
You mention the existing implementation is riddled with bugs. You mention that there is limited ...
4
votes
Accepted
Do lexers have to go word by word or can they go line by line
If you have a grammar, then that should be your guide. Going line-by-line is reasonable in a grammar and would simply include newlines in the grammar as starting or finishing syntactic constructs (...
4
votes
Accepted
Is it a good idea to let keywords have different lexical rules from names of types, variables, functions, etc?
Distinguishing keywords/operators from user-defined names is not strictly necessary. Scannerless parsers can do just fine regardless. For example, it would be feasible to define a language where the ...
3
votes
Do any programming languages let you use other languages without restriction within them?
Not really. Language interop is a difficult problem, and language embedding even more so.
Many languages have nontrivial syntax constructs that cannot be easily parsed by a general purpose parser. ...
2
votes
Should my lexer allow what is obviously a syntax error?
First a caveat: It very much depends on which subset of HTML. HTML5 does not really have the concept of errors at all. Basically any sequence of characters is valid and have a defined parse. I will ...
2
votes
Accepted
How is it possible to store the AST nodes location in the source code?
Yes, you have described the standard approach.
Creating a raw text node type which has a line number,
and then having others inherit from that might be attractive.
Error messages will typically want ...
2
votes
Accepted
How does a lexer handle template strings?
Two typical solutions:
give up on using a separate lexer. This is easy and efficient with top-down parsing approaches such as recursive descent, PEG, or parser combinators. Such an approach makes it ...
2
votes
How Should Lexers Be Stateful?
It depends on the kind of languages and it depends on whether you see statefulness on the input or the output side of the lexer.
On the input side, lexers are often stateful: If you parse a string ...
2
votes
Rewrite or Transpiler - How to move away from a proprietary SAAS solution
Why the whole hog?
You have listed numerous issues that make wholesale change risky:
Novice Staff
Tight Budgets
Third Party Supporters (misaligned goals)
Buggy/Weird behaviors
Lack of tests
From the ...
1
vote
Rewrite or Transpiler - How to move away from a proprietary SAAS solution
Transpiler or rewrite?
Just convert it.
Easily 80% of my entire career has been doing this. You get one requirement: make it do what it did before on this new system.
It isn't a rewrite and you don't ...
1
vote
Accepted
Concatenating strings given a BNF grammar
A production rule for an empty string (or equivalently an empty token) can always succeed by consuming nothing from the input.
So, when you peek '(', the parser first tries the production <Letter&...
1
vote
How Should Lexers Be Stateful?
They should not be stateful.
No mutations rightfully belong in a lexer.
All you're doing is transforming one stream (usually characters) into another (usually strings). That sort of thing is best ...
1
vote
Do lexers have to go word by word or can they go line by line
"It depends."
In early languages such as the original FORTRAN, and some COBOLs, which assumed that input would be provided on 80-column punched cards, we have the notion of a continuation ...
1
vote
Should my lexer allow what is obviously a syntax error?
Your lexer should catch syntax error for malformed tokens, and this solely. But in general, your tokens should be complex enough to avoid to return tokens which sole purpose is to delimit other token ...
1
vote
Should my lexer allow what is obviously a syntax error?
It depends on the scope of your target language and your use cases. If you want to consider </b> (but not </ b>) as a keyword then the lexer can identify that as such. </b would ...
1
vote
Differences between enumeration-based and hierarchical token typing
An enumeration is the classic/C-ish way to define tokens, but that doesn't make it extraordinarily good – precisely because it is difficult to keep track of associated values. A token might contain ...
Only top scored, non community-wiki answers of a minimum length are eligible
Related Tags
lexer × 49parsing × 31
compiler × 12
programming-languages × 11
language-design × 6
grammar × 6
java × 3
design × 2
data-structures × 2
functional-programming × 2
syntax × 2
theory × 2
regular-expressions × 2
parser-combinator × 2
c# × 1
design-patterns × 1
c++ × 1
algorithms × 1
php × 1
python × 1
unit-testing × 1
programming-practices × 1
testing × 1
c × 1
terminology × 1