"be consistent" | |
PerlMonks |
Re^10: Block-structured language parsing using a Perl module?by BrowserUk (Patriarch) |
on Aug 18, 2012 at 10:16 UTC ( [id://988187]=note: print w/replies, xml ) | Need Help?? |
I really don't see any need for the lexer to do all the machinations you're worrying with. I don't think you read what I wrote ... or I wrote it badly. I'm not worrying about any machinations in the lexer. I don't want (and assert, shouldn't have) to write my own lexer. it's pretty straightforward, For this language may be so, but imagine how much more straight forward it would be if you didn't have to write it. By definition, the grammar contains all the terminal symbols, and how those terminal symbols can be combined. It could produce your %keywords hash for you from the grammar, and in the process ensure that the grammar and the lexer's hash of tokens, remain in synchronisation. But more than that, it also knows at each stage what token(s) are possible next in the language going forward from the point it is currently at, so it could inspect the next part of the data and very quickly determine whether what is there makes sense in context. I assert, it not only could, it should. and it doesn't worry about tokens other than the current one. One example does not an (counter-) argument make. More to the point: You extract the next token and pass it to the parser, and the parser rejects it. What do you report? What can you report? About all you can say, given your lexer's lack of context, is: [source.file:123:31] Numeric literal '123' not expected at this time.Not so helpful. Whereas the parser could report something like: [source.file:123:31] parsing 'while', expecting '('; got '123'Which would you prefer? Writing character-by-character grammar rules to recognize numbers, keywords, strings, comments, etc. would be a pain in BNF. I don't really look forward to writing a zillion BNF productions to specify what tokens look like character by character. But that sort of thing is trivial for regexes, so I split the lexer out into a simple bit of regex code to create the tokens, and the grammar is relatively straightforward too. But why would you? Why not supply your identifier syntax; literal syntax etc. to the parser as regex! I don't anticipate changing anyone's mind immediately. I'm expressing my reasoning for rejecting the module, but that won't make it disappear from cpan, or stop anyone who want to use it from doing so. The job I'm taking on that requires a real parser, is sufficiently long-term and complex, that it is worth my trying to avoid the duplication of effort, and parallel resource maintenance, that I see being required by using Marpa. Even if that means writing my own parser than generates a lexer as a part of the grammar compile step. With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
In Section
Seekers of Perl Wisdom
|
|