Hottest 'lexer' Answers

16 votes

Accepted

How would you test a lexer?

Your grammar probably has some rules for each token on how it can be produced (for example, that a { signifies a BLOCK_START token, or that a string-literal token is delimited by " characters). ...

Bart van Ingen Schenau

79k

answered Feb 4, 2021 at 13:37

16 votes

If you're writing the lexer yourself, this seems like an ideal case for test-driven development. While “the number of combinations of tokens in a source file can be huge,” the number of branches in ...

Arseni Mourzenko

139k

answered Feb 4, 2021 at 13:37

15 votes

Rewrite or Transpiler - How to move away from a proprietary SAAS solution

Let's me take on your issues one-by-one: the current implementation has bugs Yep, and when you transpile such a code, what makes you think those bugs will not be transpiled as well? With a rewrite, ...

Doc Brown

220k

answered Feb 2 at 15:35

11 votes

Accepted

Should my lexer allow what is obviously a syntax error?

Your lexer is never going to be able to diagnose all syntax errors unless you make it as powerful as the parser itself. This would be a large and totally unnecessary amount of work, and the only ...

Kilian Foth

111k

answered Dec 2, 2018 at 19:39

8 votes

What should be the datatype of the tokens a lexer returns to its parser?

As said in the title, which data type should a lexer return/give the parser? "Token", obviously. A lexer produces a stream of tokens, so it should return a stream of tokens. He mentioned Flex, a ...

Eric Lippert

46.6k

answered Mar 19, 2018 at 23:12

7 votes

How would you test a lexer?

One alternative that others aren't mentioning, is to use a test generative approach—like QuickCheck from Haskell—to generate the edge cases from the grammar you've defined. Now, once they're generated,...

A T

761

answered Feb 5, 2021 at 2:40

6 votes

Rewrite or Transpiler - How to move away from a proprietary SAAS solution

Obviously you ask "rewrite or transpile?", but I'm not clear what the underlying issue is. You mention the existing implementation is riddled with bugs. You mention that there is limited ...

Steve

12.6k

answered Feb 2 at 13:25

4 votes

Accepted

Do lexers have to go word by word or can they go line by line

If you have a grammar, then that should be your guide. Going line-by-line is reasonable in a grammar and would simply include newlines in the grammar as starting or finishing syntactic constructs (...

Erik Eidt

34.8k

answered Aug 8, 2021 at 23:28

4 votes

Accepted

Is it a good idea to let keywords have different lexical rules from names of types, variables, functions, etc?

Distinguishing keywords/operators from user-defined names is not strictly necessary. Scannerless parsers can do just fine regardless. For example, it would be feasible to define a language where the ...

amon

136k

answered Apr 23, 2023 at 9:44

3 votes

Do any programming languages let you use other languages without restriction within them?

Not really. Language interop is a difficult problem, and language embedding even more so. Many languages have nontrivial syntax constructs that cannot be easily parsed by a general purpose parser. ...

amon

136k

answered May 30, 2018 at 19:49

2 votes

Should my lexer allow what is obviously a syntax error?

First a caveat: It very much depends on which subset of HTML. HTML5 does not really have the concept of errors at all. Basically any sequence of characters is valid and have a defined parse. I will ...

JacquesB

62.4k

answered Dec 2, 2018 at 22:02

2 votes

Accepted

How is it possible to store the AST nodes location in the source code?

Yes, you have described the standard approach. Creating a raw text node type which has a line number, and then having others inherit from that might be attractive. Error messages will typically want ...

J_H

7,891

answered Jan 19, 2023 at 21:54

2 votes

Accepted

How does a lexer handle template strings?

Two typical solutions: give up on using a separate lexer. This is easy and efficient with top-down parsing approaches such as recursive descent, PEG, or parser combinators. Such an approach makes it ...

amon

136k

answered Aug 21, 2021 at 7:01

2 votes

How Should Lexers Be Stateful?

It depends on the kind of languages and it depends on whether you see statefulness on the input or the output side of the lexer. On the input side, lexers are often stateful: If you parse a string ...

Christophe

82.2k

answered Jun 13, 2023 at 21:47

2 votes

Rewrite or Transpiler - How to move away from a proprietary SAAS solution

Why the whole hog? You have listed numerous issues that make wholesale change risky: Novice Staff Tight Budgets Third Party Supporters (misaligned goals) Buggy/Weird behaviors Lack of tests From the ...

Kain0_0

16.6k

answered Feb 3 at 0:49

1 vote

Rewrite or Transpiler - How to move away from a proprietary SAAS solution

Transpiler or rewrite? Just convert it. Easily 80% of my entire career has been doing this. You get one requirement: make it do what it did before on this new system. It isn't a rewrite and you don't ...

candied_orange

120k

answered Feb 2 at 23:48

1 vote

Accepted

Concatenating strings given a BNF grammar

A production rule for an empty string (or equivalently an empty token) can always succeed by consuming nothing from the input. So, when you peek '(', the parser first tries the production <Letter&...

Bart van Ingen Schenau

79k

answered May 23, 2024 at 6:25

1 vote

How Should Lexers Be Stateful?

They should not be stateful. No mutations rightfully belong in a lexer. All you're doing is transforming one stream (usually characters) into another (usually strings). That sort of thing is best ...

Telastyn

110k

answered Jun 13, 2023 at 13:56

1 vote

Do lexers have to go word by word or can they go line by line

"It depends." In early languages such as the original FORTRAN, and some COBOLs, which assumed that input would be provided on 80-column punched cards, we have the notion of a continuation ...

Mike Robinson

1,821

answered Aug 9, 2021 at 2:49

1 vote

Should my lexer allow what is obviously a syntax error?

Your lexer should catch syntax error for malformed tokens, and this solely. But in general, your tokens should be complex enough to avoid to return tokens which sole purpose is to delimit other token ...

Diane M

2,116

answered Dec 3, 2018 at 13:32

1 vote

Should my lexer allow what is obviously a syntax error?

It depends on the scope of your target language and your use cases. If you want to consider </b> (but not </ b>) as a keyword then the lexer can identify that as such. </b would ...

Telastyn

110k

answered Dec 2, 2018 at 19:40

1 vote

Differences between enumeration-based and hierarchical token typing

An enumeration is the classic/C-ish way to define tokens, but that doesn't make it extraordinarily good – precisely because it is difficult to keep track of associated values. A token might contain ...

amon

136k

answered Nov 7, 2018 at 17:17

Stack Exchange Network

Tag Info

Hot answers tagged lexer

How would you test a lexer?

How would you test a lexer?

Rewrite or Transpiler - How to move away from a proprietary SAAS solution

Should my lexer allow what is obviously a syntax error?

What should be the datatype of the tokens a lexer returns to its parser?

How would you test a lexer?

Rewrite or Transpiler - How to move away from a proprietary SAAS solution

Do lexers have to go word by word or can they go line by line

Is it a good idea to let keywords have different lexical rules from names of types, variables, functions, etc?

Do any programming languages let you use other languages without restriction within them?

Should my lexer allow what is obviously a syntax error?

How is it possible to store the AST nodes location in the source code?

How does a lexer handle template strings?

How Should Lexers Be Stateful?

Rewrite or Transpiler - How to move away from a proprietary SAAS solution

Rewrite or Transpiler - How to move away from a proprietary SAAS solution

Concatenating strings given a BNF grammar

How Should Lexers Be Stateful?

Do lexers have to go word by word or can they go line by line

Should my lexer allow what is obviously a syntax error?

Should my lexer allow what is obviously a syntax error?

Differences between enumeration-based and hierarchical token typing

Tag Info

Hot answers tagged lexer

Related Tags