One advantage of splitting the parser and lexer is that rather than having one humongous state machine that has to cover both grammar and lexing, you can split the task into two smaller machines. As you know, state machine complexity tends to grow exponentially, so two state machines half the size of a combined one can be *much* more tractable.
This is where I have the problem: I don't see the need for two state machines. Nor even a bigger state machine.
The state machine built by the grammar compiler, knows (Should know!) everything needed and have all the state required to perform the tokenising. Nothing extra is required. Unless you force the unnecessary split between the parsing and the lexing, at which point you have to duplicate everything.
An example. In perl, there is an unfortunate ambiguity in the the parsing of print statements:
print ( 2 + 3 ) * 4;;
print (...) interpreted as function at (eval 9) line 1, <STDIN> line 1
+.
5
Note: There is no error relating to the * 4 bit. Why? Because print returns a number which can be multiplied: print print ( 2 + 3 ) * 4;;
print (...) interpreted as function at (eval 10) line 1, <STDIN> line
+2.
5
4
With sufficient knowledge, this ambiguity could be resolved; and the parser should have enough knowledge (state) to know whether the print statement is in a void context as in the first case above, in which case it should treat the parens as grouping rather than function args; or in a list context as in the second example when the latter is perhaps better.
Marpa requires that you label the tokens when the lexer passes them to the parser (using the misnamed ->read( 'label', value ) method ). That means the lexer has to have already recognised what the token is; which means that it needs to maintain the same state as the parser in order to make that determination.
At that point, what's the purpose in passing the token to the parser? Do we need a second opinion?
It is easy to see how Marpa can make the claim that "Marpa does not use lookahead and never backtracks.": it's because it makes the lexer do all the lookahead and backtracking! (And therefore duplicate all the state required to do so.)
With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
|