17
votes
Accepted
How does the GLL parsing algorithm work?
The reason LL grammars can't handle left-recursion or ambiguity is because those are both situations where the grammar has to make a choice based on something that it won't discover until the future. ...
16
votes
How would you test a lexer?
If you're writing the lexer yourself, this seems like an ideal case for test-driven development.
While “the number of combinations of tokens in a source file can be huge,” the number of branches in ...
16
votes
Accepted
How would you test a lexer?
Your grammar probably has some rules for each token on how it can be produced (for example, that a { signifies a BLOCK_START token, or that a string-literal token is delimited by " characters). ...
15
votes
Accepted
What are the 'practical' advantages of LR parser over LL parser 'in today'?
True LL parsers are fast and simple – but also completely unable to parse many interesting languages. Many real-world programming languages are in the LALR class, which is closely related to LR but ...
15
votes
Rewrite or Transpiler - How to move away from a proprietary SAAS solution
Let's me take on your issues one-by-one:
the current implementation has bugs
Yep, and when you transpile such a code, what makes you think those bugs will not be transpiled as well? With a rewrite, ...
12
votes
How to Unit-Test a parser of a file?
I want to write tests for it.
What you are intending to test?
I want to use TDD. I'm refactoring a parser and want to test the 'parse()' method.
So the aim is to clean things up.
I would argue that ...
11
votes
Accepted
Should my lexer allow what is obviously a syntax error?
Your lexer is never going to be able to diagnose all syntax errors unless you make it as powerful as the parser itself. This would be a large and totally unnecessary amount of work, and the only ...
10
votes
How can one interpret an Abstract Syntax Tree without recursion?
You can systematically calculate such an approach. For simplicity, I'll consider evaluating arithmetic expressions, but the transforms are completely mechanical. I'll use Haskell as the language.
data ...
10
votes
Accepted
C++ Recursive Descent Parser: Global Variable Dilemma
The group of methods which belong to one parser are tightly coupled because of the nature of the problem. The parser has some state to manage which is shared between all those functions. So putting ...
10
votes
Accepted
Reasons to use (and not to use) a repeated delimiter to escape that delimiter?
It seems the parser would need to recognize the ', but then conditionally undo that decision based on the next character. While this a fairly minor check, it seems strange to me given the alternative ...
10
votes
Is it possible to make a compiler for any dynamic/script/interpreter language
Can we (Is it possible to) make a compiler for any dynamic/script/interpreter language,
Yes, it is.
Note that there is no such thing as an "interpreter" language. An interpreter is a ...
9
votes
How to Unit-Test a parser of a file?
This approach may work but as far as I understand from unit test methodology, unit-tests shouldn't perform I/O
I think you are overlooking the elephant in the room here: your parser should not perform ...
9
votes
Is it possible to make a compiler for any dynamic/script/interpreter language
Yes, it is possible to create a “compiler” for any language, no matter how dynamic. But this isn't necessarily a useful thing to do.
1. Many languages like Ruby, Python, and JavaScript are extremely ...
8
votes
Should a SVG be converted to JSON to be stored in the database?
An other software engineer in my team told me to use JSON but couldn't substantiate why?
Maybe it was a joke? I can't think of any benefit to storing SVG inside JSON. None.
I would store SVG files......
8
votes
Accepted
Is a literal out of range a syntax error or a semantic error?
It depends on the grammar of the language.
If the grammar precludes recognizing 256 or 257, then those character sequences are syntax errors — that character sequence is outside of what can be ...
7
votes
Accepted
Meaningful response to the user after his uploaded CSV was processed?
This depends on the way the CSV is created, maintained or fixed in case of errors, and on how the imported data will be processed after.
Let's start with the question of "all or nothing" vs. "reject ...
7
votes
Accepted
How do different file types generally store data?
The reason there are many different file formats is that there are many different goals for the way data is formatted. Some of these are in opposition to each other and some are orthogonal to each ...
7
votes
Accepted
Are parsers a special case in unit testing?
Parsers are not a special case, what you desribe is somewhat common. If the class in question is of a reasonable size, adheres to seperation of concerns, single-responsibility principle and so on, ...
7
votes
Accepted
Backend - own database vs live parsing
I strongly recommend using a database with a well-defined schema. Why? Independence/Decoupling. With a database, things are under your control, not theirs. Lets look at the pros and cons of each ...
7
votes
Should a SVG be converted to JSON to be stored in the database?
Neither.
You should store the model the application uses to generate the svg. ie for a network diagram the various nodes and joins.
That way when you change the svg generation code the old models ...
7
votes
Accepted
How can one interpret an Abstract Syntax Tree without recursion?
You need an operand stack (to store intermediate values) as well as a call stack where you place something to remind you what to do next when you return from a sub-activity. Most CPUs mix those two ...
7
votes
Accepted
Is it wrong to parse YAML `true` as a string?
YAML is an extremely complicated format, and the result of that document may differ between YAML versions and parsers. YAML itself does not prescribe a data model beyond scalars/sequences/mappings/...
7
votes
How would you test a lexer?
One alternative that others aren't mentioning, is to use a test generative approach—like QuickCheck from Haskell—to generate the edge cases from the grammar you've defined.
Now, once they're generated,...
6
votes
Accepted
Which is a more efficient approach to decoding escape sequences in text?
Most parsers decode string escapes during tokenization. Note that the tokenizer needs to be aware of escapes anyway to determine the end of a string, since the string delimiter " itself may be escaped....
6
votes
How exactly is an Abstract Syntax Tree created?
I know this question is 4+ years old but I feel I should add a more detailed answer.
Abstract Syntax Trees are created no differently from other trees; the more true statement in this case is that ...
6
votes
Accepted
AST design: Call is both expression and statement?
The AST is a bridge between the concrete syntax and the semantic model of your language. It does not have to conform exactly to the syntax. In particular, it is generally OK if the AST could ...
6
votes
Accepted
Is there a name for these diagrams that show the flow of a valid value
They are called railroad diagrams or syntax diagrams.
Syntax diagrams (or railroad diagrams) are a way to represent a context-free grammar. They represent a graphical alternative to Backus–Naur form ...
6
votes
How does the GLL parsing algorithm work?
Since I don't have enough reputation to comment I am dropping this as an answer.
Adrian Johnstone (the co-author of the GLL parsing paper) has an implementation of the GLL parser generator in his tool ...
6
votes
What type of syntax notation is this?
It is a type of syntax diagram called a railroad diagram. See also here, here, and here.
Thanks to @davidbak who found the answer.
6
votes
Fail fast is brittle
For situations where some data constellation indicates a bug in code, "fail fast" is usually the only viable a strategy - the program shall be stopped, and the programmer or vendor needs to ...
Only top scored, non community-wiki answers of a minimum length are eligible
Related Tags
parsing × 301compiler × 41
lexer × 31
programming-languages × 29
grammar × 26
language-design × 22
c++ × 19
java × 18
xml × 16
syntax × 16
c# × 15
algorithms × 14
data-structures × 14
python × 12
html × 12
trees × 12
c × 11
php × 10
design-patterns × 9
design × 8
language-agnostic × 8
interpreters × 8
regular-expressions × 8
javascript × 7
unit-testing × 6