Skip to main content
fix grammar
Source Link
Kilian Foth
  • 111k
  • 45
  • 301
  • 323

How do parsers search for token patterns?

Could you explain how parsers search for token patterns like in markdown?

I probably could come up with something matching only the braces pattern []() as soon as nested patterns are involved it blows my mind.

For example in something like this

foo [**baz**](baz) qux

the tokenizer probably explodes the string into these tokens

"foo ", "[", "**", "baz", "**", "]", "(", "baz", ")", " qux"

and passes it to the parser to recognize the patterns, that it's a link and that the braces match and then even understand the bold style inside the label.

I guess it's some kind of a state machine but does it really thinksthink that as soon as a [ ocurrs it might mean something so store the token and if the subsequent tokens don't match then discard this state and turn the separator tokens into a normal literal. This would mean that it had to go back change the meaning of everything else if there was no ( after the closing ]. Do I think totoo complex?

It looks like it was easy to implement when I look at it, but if I should invent an algorithm for it, I couldn't.

How parsers search for token patterns?

Could you explain how parsers search for token patterns like in markdown?

I probably could come up with something matching only the braces pattern []() as soon as nested patterns are involved it blows my mind.

For example in something like this

foo [**baz**](baz) qux

the tokenizer probably explodes the string into these tokens

"foo ", "[", "**", "baz", "**", "]", "(", "baz", ")", " qux"

and passes it to the parser to recognize the patterns, that it's a link and that the braces match and then even understand the bold style inside the label.

I guess it's some kind of a state machine but does it really thinks that as soon as a [ ocurrs it might mean something so store the token and if the subsequent tokens don't match then discard this state and turn the separator tokens into a normal literal. This would mean that it had to go back change the meaning of everything else if there was no ( after the closing ]. Do I think to complex?

It looks like it was easy to implement when I look at it, but if I should invent an algorithm for it, I couldn't.

How do parsers search for token patterns?

Could you explain how parsers search for token patterns like in markdown?

I probably could come up with something matching only the braces pattern []() as soon as nested patterns are involved it blows my mind.

For example in something like this

foo [**baz**](baz) qux

the tokenizer probably explodes the string into these tokens

"foo ", "[", "**", "baz", "**", "]", "(", "baz", ")", " qux"

and passes it to the parser to recognize the patterns, that it's a link and that the braces match and then even understand the bold style inside the label.

I guess it's some kind of a state machine but does it really think that as soon as a [ ocurrs it might mean something so store the token and if the subsequent tokens don't match then discard this state and turn the separator tokens into a normal literal. This would mean that it had to go back change the meaning of everything else if there was no ( after the closing ]. Do I think too complex?

It looks like it was easy to implement when I look at it, but if I should invent an algorithm for it, I couldn't.

Tweeted twitter.com/StackSoftEng/status/795415337789431808
deleted 22 characters in body
Source Link
t3chb0t
  • 2.6k
  • 3
  • 23
  • 35

Could you explain how parsers search for token patterns like in markdown?

I probably could come up with something matching only the braces pattern []() as soon as nested patterns are involved it blows my mind.

For example in something like this

foo [**baz**](baz) qux

the tokenizer probably explodes the string into these tokens

"foo ", "[", "**", "baz", "**", "]", "(", "baz", ")", " qux"

and passes it to the parser to recognize the patterns, that it's a link and that the braces match and then even understand the bold style inside the label.

I guess it's some kind of a state machine but does it really thinks that as soon as a [ ocurrs it might mean something so store the token and if the subsequent tokens don't match then discard this state and turn the separator tokens into a normal literal. This would mean that it had to go back change the meaning of everything else if there was no ( after the closing ]. Do I think to complex?

It looks like it was almost obvious what each pattern meanseasy to implement when I look at it, but if I should invent an algorithm for it, I probably couldn't.

Could you explain how parsers search for token patterns like in markdown?

I probably could come up with something matching only the braces pattern []() as soon as nested patterns are involved it blows my mind.

For example in something like this

foo [**baz**](baz) qux

the tokenizer probably explodes the string into these tokens

"foo ", "[", "**", "baz", "**", "]", "(", "baz", ")", " qux"

and passes it to the parser to recognize the patterns, that it's a link and that the braces match and then even understand the bold style inside the label.

I guess it's some kind of a state machine but does it really thinks that as soon as a [ ocurrs it might mean something so store the token and if the subsequent tokens don't match then discard this state and turn the separator tokens into a normal literal. This would mean that it had to go back change the meaning of everything else if there was no ( after the closing ]. Do I think to complex?

It looks like it was almost obvious what each pattern means when I look at it, but if I should invent an algorithm I probably couldn't.

Could you explain how parsers search for token patterns like in markdown?

I probably could come up with something matching only the braces pattern []() as soon as nested patterns are involved it blows my mind.

For example in something like this

foo [**baz**](baz) qux

the tokenizer probably explodes the string into these tokens

"foo ", "[", "**", "baz", "**", "]", "(", "baz", ")", " qux"

and passes it to the parser to recognize the patterns, that it's a link and that the braces match and then even understand the bold style inside the label.

I guess it's some kind of a state machine but does it really thinks that as soon as a [ ocurrs it might mean something so store the token and if the subsequent tokens don't match then discard this state and turn the separator tokens into a normal literal. This would mean that it had to go back change the meaning of everything else if there was no ( after the closing ]. Do I think to complex?

It looks like it was easy to implement when I look at it, but if I should invent an algorithm for it, I couldn't.

Source Link
t3chb0t
  • 2.6k
  • 3
  • 23
  • 35

How parsers search for token patterns?

Could you explain how parsers search for token patterns like in markdown?

I probably could come up with something matching only the braces pattern []() as soon as nested patterns are involved it blows my mind.

For example in something like this

foo [**baz**](baz) qux

the tokenizer probably explodes the string into these tokens

"foo ", "[", "**", "baz", "**", "]", "(", "baz", ")", " qux"

and passes it to the parser to recognize the patterns, that it's a link and that the braces match and then even understand the bold style inside the label.

I guess it's some kind of a state machine but does it really thinks that as soon as a [ ocurrs it might mean something so store the token and if the subsequent tokens don't match then discard this state and turn the separator tokens into a normal literal. This would mean that it had to go back change the meaning of everything else if there was no ( after the closing ]. Do I think to complex?

It looks like it was almost obvious what each pattern means when I look at it, but if I should invent an algorithm I probably couldn't.