Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

Re: Parsing with Regexes and Beyond

by ikegami (Pope)
on Jun 07, 2008 at 09:11 UTC ( #690816=note: print w/ replies, xml ) Need Help??


in reply to Parsing with Regexes and Beyond

Good!

Small nit: qr{#.*\n?$}m can be written simply as qr{#.*\n?}

Small problem: 3 + 4 6 5 7 is considered valid by your parser. You need to check to make sure all tokens were absorbed or return an EOF token and check if that's the next token. The latter method would remove the need for "no warnings 'uninitialized';".

A big difference between your parser and PRD is that yours doesn't allow backtracking. With PRD, you can do "ruleA: ruleB | ruleC" without worrying if ruleB and ruleC can start with the same token. With yours, you'd need to factor out the common leading token ("ruleA: token ( ruleB | ruleC )"). But that's probably not a bad thing, since you should do that in PRD anyway for performance reasons. On the plus side, I think that gives PRD an "infinite" lookahead.

I think you could improve performance by generating a single regexp from @token_def (/\G(?>$ws)(?>$tok1(?{...}|$tok2(?{...}|...)/g).

Also, it would be better if the lexer was an iterator instead of doing all the lexing up front. That would decrease the memory footprint.


Comment on Re: Parsing with Regexes and Beyond
Select or Download Code
Re^2: Parsing with Regexes and Beyond
by moritz (Cardinal) on Jun 07, 2008 at 09:31 UTC
    Small problem: 3 + 4 6 5 7 is considered valid by your parser. You need to check to make sure all tokens were absorbed or return an EOF token and check if that's the next token. The latter method would remove the need for "no warnings 'uninitialized';".

    Very good catch, thank you. I'll update the tutorial accordingly.

    I think you could improve performance by generating a single regexp from @token_def (/\G(?>$ws)(?>$tok1(?{...}|$tok2(?{...}|...)/g).

    Since (?{...}) is marked as experimental even in perl 5.10.0 and I (re)discovered some bugs in it, I won't use it, at least not in a tutorial at this level. Thanks for the suggestion anyway, it made me think about 5.10's named captures that could be used instead.

    Also, it would be better if the lexer was an iterator instead of doing all the lexing up front. That would decrease the memory footprint.

    Indeed. I decided against it because it slightly increases complexity (or at least makes it harder to understand), but I should at least mention it.

    I guess I'll find some time tomorrow to incorporate your suggestions.

      I won't use it, at least not in a tutorial at this level.

      I decided against it because it slightly increases complexity

      And that's why I didn't code up a solution for the last two tips. They were more along the lines of "Where do we go from here?". I meant to say as much, but I was being rushed away.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://690816]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others scrutinizing the Monastery: (6)
As of 2014-12-20 07:47 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (95 votes), past polls