Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

Re: Parsing with Regexes and Beyond

by ikegami (Patriarch)
on Jun 07, 2008 at 09:11 UTC ( [id://690816]=note: print w/replies, xml ) Need Help??


in reply to Parsing with Regexes and Beyond

Good!

Small nit: qr{#.*\n?$}m can be written simply as qr{#.*\n?}

Small problem: 3 + 4 6 5 7 is considered valid by your parser. You need to check to make sure all tokens were absorbed or return an EOF token and check if that's the next token. The latter method would remove the need for "no warnings 'uninitialized';".

A big difference between your parser and PRD is that yours doesn't allow backtracking. With PRD, you can do "ruleA: ruleB | ruleC" without worrying if ruleB and ruleC can start with the same token. With yours, you'd need to factor out the common leading token ("ruleA: token ( ruleB | ruleC )"). But that's probably not a bad thing, since you should do that in PRD anyway for performance reasons. On the plus side, I think that gives PRD an "infinite" lookahead.

I think you could improve performance by generating a single regexp from @token_def (/\G(?>$ws)(?>$tok1(?{...}|$tok2(?{...}|...)/g).

Also, it would be better if the lexer was an iterator instead of doing all the lexing up front. That would decrease the memory footprint.

Replies are listed 'Best First'.
Re^2: Parsing with Regexes and Beyond
by moritz (Cardinal) on Jun 07, 2008 at 09:31 UTC
    Small problem: 3 + 4 6 5 7 is considered valid by your parser. You need to check to make sure all tokens were absorbed or return an EOF token and check if that's the next token. The latter method would remove the need for "no warnings 'uninitialized';".

    Very good catch, thank you. I'll update the tutorial accordingly.

    I think you could improve performance by generating a single regexp from @token_def (/\G(?>$ws)(?>$tok1(?{...}|$tok2(?{...}|...)/g).

    Since (?{...}) is marked as experimental even in perl 5.10.0 and I (re)discovered some bugs in it, I won't use it, at least not in a tutorial at this level. Thanks for the suggestion anyway, it made me think about 5.10's named captures that could be used instead.

    Also, it would be better if the lexer was an iterator instead of doing all the lexing up front. That would decrease the memory footprint.

    Indeed. I decided against it because it slightly increases complexity (or at least makes it harder to understand), but I should at least mention it.

    I guess I'll find some time tomorrow to incorporate your suggestions.

      I won't use it, at least not in a tutorial at this level.

      I decided against it because it slightly increases complexity

      And that's why I didn't code up a solution for the last two tips. They were more along the lines of "Where do we go from here?". I meant to say as much, but I was being rushed away.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://690816]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (6)
As of 2025-03-26 08:49 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    When you first encountered Perl, which feature amazed you the most?










    Results (67 votes). Check out past polls.

    Notices?
    erzuuliAnonymous Monks are no longer allowed to use Super Search, due to an excessive use of this resource by robots.