Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

How to parse a limited grammar

by clinton (Priest)
on Jan 10, 2008 at 10:04 UTC ( #661593=perlquestion: print w/ replies, xml ) Need Help??
clinton has asked for the wisdom of the Perl Monks concerning the following question:

I'm writing a reporting module, which allows a (knowledgeable) user to specify the content and format of a report in HTML/Excel/XML/PDF etc, without having to touch the code.

For instance, this report would return a list of order_line objects - one for each row, then proceed to write out each column in the row, based on the current order_line object:

I would have a %vars hash which would contain (eg) the $lang variable and and would store the row variables $order_line, $order and $product, and a %function hash, which would contain the lookup() and currency() code refs.

title: This week's order lines object_type: order_line query: ... query params that return a list of Object IDs +... row_vars: order: $order_line.order # Stores $orde +r product: $order_line.product # and $product + in the %vars hash columns: - title: Order value: $order.id # $order->id - title: Order date value: $order.create_date.yMonthd($lang) # $order->crea +te_date->yMonthd($lang) - title: Order total value: $order.total | currency # currency ($o +rder->total) format: align: right - title: Invoice no value: $order.invoice_id || 'Not invoiced' - title: Pickup address value: $order_line.pickup_id ? $order_line.pickup_id | lookup ('pickup',$la +ng,$order_line.pickup_id) : $order_line.pickup_text

The question: how should I parse the value fields?

The value fields are short, discreet strings, in a TT style. The grammar is limited, but allows for quite a lot of flexibility.

My first inclination was to use a few regexes and to string together some coderefs, the return value from the previous coderef being passed as an argument to the next coderef. This would be fine, until I hit the logical operators (&& || ?:). Suddenly, this introduced the concept of branching into the code.

It feels like it should be a small job, and so I have been loathe to load up a full parser to do it. Having read Higher Order Perl, I have considered using Dominus' HOP::Parser, but I'm wondering if I should bite the bullet and use one of the standard modules like Parse::RecDescent or Parse::Yapp.

I have no experience with any of these modules. What would you recommend to handle what (in my ignorant opinion) should be a light-weight parsing task?

thanks

Clint

UPDATE : I have now posted my parser here: A parser for a limited grammar

Comment on How to parse a limited grammar
Select or Download Code
Re: How to parse a limited grammar
by TedYoung (Deacon) on Jan 10, 2008 at 16:35 UTC

    Well, if you are confident that the users of your software are trustworthy, you could simply apply one or more s///'s to the value and eval it:

    $value =~ s/\./->/g; ... my $generator = eval "sub { $value }"; die if $@; for (@rows) { ... print $generator->(); ... }

    This gives you tremendous power with a minimum amount of work. Though, if you don't completely trust your users, it may give you too much power. You could limit that with Safe, but that has been shown not to be a complete sandbox (it makes it much harder for the user to inadvertently screw up things, but if they have their heart set on it, they can exploit certain flaws).

    HOP::Parser is great but very, very slow. You only need it if you are parsing streams. Here you have very short strings and, if the above solution won't work for you, you can use traditional lexing semantics to build up a list of tokens. Your grammar is probably simple enough to simply iterator over those tokens and generate valid perl to be eval'ed (or derive the value as you go).

    # untested, just pseudo code my @tokens; local $_ = $value; while ( !/\G$/gc ) { # while not at end of string push @tokens, /\G\./gc ? [ 'DOT', '.' ] : /\G(\$\w+)/gc ? [ 'VAR', $1 ] : /\G(\w+)\((.*?)\)/gc ? [ 'METHOD', $1, [ split /\s*,\s*/, $2 ] : /\G(\w+)/gc ? [ 'METHOD', $1, [] ] : ... last }

    Note that if you want proper nesting of () in method calls, you will need to use a better regex. Continue to add tokenizers for operators (|, &) etc. When done, just iterate over the list of tokens and generate either the value or if necessary generate trusted perl code and eval it.

    update:

    # again, untested, just for demonstration my $code = ''; for (@tokens) { my ($type, $source, @params) = @$_; if ($type eq 'VAR') { $code .= $source; } elsif ($type eq 'DOT') { $code .= '->'; } elsif ($type eq 'METHOD') { .... } my $generator = eval "sub { $code }"; die if $@; for (@rows) { ... print $generator->(); ... }

    update 2:

    If you choose either eval option above, you can expose variables to your value code like this:

    my $generator = eval "sub { my (\$order, \$order_line) = @_; $code }"; ... $generator->($row->{order}, $row->{order_line});

    This exposes variables to your value code without the need of globals and namespaces.

    Ted Young

    It is almost impossible for me to read contemporary mathematicians who, instead of saying "Petya washed his hands," write simply: "There is a t1 < 0 such that the image of t1 under the natural mapping t1 -> Petya(t1) belongs to the set of dirty hands, and a t2, t1 < t2 <= 0, such that the image of t2 under the above-mentioned mapping belongs to the complement of the set defined in the preceding sentence."
    The Russian mathematician V. I. Arnol'd

      Thanks for the reply Ted

      Yes, I had considered just eval'ing the code - as you surmised, this will be added by trusted users only. Paranoia, an aversion to string evals, and a desire to learn about parsing led me to this post. But evals may yet be the way to go.

      HOP::Parser is great but very, very slow.

      That is what I feared - good to know

      Probably my main reason for looking at a proper parser solution was to be able to handle nested expressions and logic branches (terminology?). I didn't want to waste time going down one road if it was obvious to everybody else that I shouldn't bother.

      Given what you've said, I'm going to give it a go, and just see where it takes me.

      thanks again

        Good. But now I feel bad. I want to qualify my statement about HOP::Lexer being slow (which is a relative statement). It is much faster than some alternatives but is much slower than lexing by hand (as shown above). I found that, in my grammars, lexing by hand was 10 times faster. That is not because HOP::Lexer is bad, but because it has to contend with streams, a feature that makes it much more powerful then lexing by hand, but completely unnecessary for what we are doing here.

        HOP::Lexer and High Order Perl are good products!

        Ted Young

        It is almost impossible for me to read contemporary mathematicians who, instead of saying "Petya washed his hands," write simply: "There is a t1 < 0 such that the image of t1 under the natural mapping t1 -> Petya(t1) belongs to the set of dirty hands, and a t2, t1 < t2 <= 0, such that the image of t2 under the above-mentioned mapping belongs to the complement of the set defined in the preceding sentence."
        The Russian mathematician V. I. Arnol'd
Re: How to parse a limited grammar
by hsmyers (Canon) on Jan 11, 2008 at 01:50 UTC
    Broadly speaking, there are two preliminary considerations to think about here; roll your own or use a module. The usual arguments about code reuse etc. apply here but there is an important consideration. For the modules listed above, the learning curve is fairly steep particularly for those with limited experience in writing parsers by hand or otherwise. The trouble with writing parsers is that is much easier if you've already written one! Without much analysis, your needs could probably be meet by a simple 'thing at a time' hand written approach. This is how most learn how (to write parsers) in the first place.

    --hsm

    "Never try to teach a pig to sing...it wastes your time and it annoys the pig."

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://661593]
Approved by Corion
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others cooling their heels in the Monastery: (4)
As of 2014-08-31 03:02 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The best computer themed movie is:











    Results (294 votes), past polls