How to parse a limited grammar

clinton has asked for the wisdom of the Perl Monks concerning the following question:

I'm writing a reporting module, which allows a (knowledgeable) user to specify the content and format of a report in HTML/Excel/XML/PDF etc, without having to touch the code.

For instance, this report would return a list of order_line objects - one for each row, then proceed to write out each column in the row, based on the current order_line object:

I would have a %vars hash which would contain (eg) the $lang variable and and would store the row variables $order_line, $order and $product, and a %function hash, which would contain the lookup() and currency() code refs.


    title:          This week's order lines
    object_type:    order_line
    query:
                    ... query params that return a list of Object IDs 
+...
    row_vars:
        order:      $order_line.order                   # Stores $orde
+r 
        product:    $order_line.product                 # and $product
+ in the %vars hash
    columns:
        -
            title:  Order
            value:  $order.id                           # $order->id
        -
            title:  Order date
            value:  $order.create_date.yMonthd($lang)   # $order->crea
+te_date->yMonthd($lang)
        -
            title:  Order total
            value:  $order.total | currency             # currency ($o
+rder->total)
            format:
                    align:  right
        -
            title:  Invoice no
            value:  $order.invoice_id || 'Not invoiced' 
        -
            title:  Pickup address
            value:  $order_line.pickup_id 
                        ? $order_line.pickup_id | lookup ('pickup',$la
+ng,$order_line.pickup_id)
                        : $order_line.pickup_text
[download]

The question: how should I parse the `value` fields?

The value fields are short, discreet strings, in a TT style. The grammar is limited, but allows for quite a lot of flexibility.

My first inclination was to use a few regexes and to string together some coderefs, the return value from the previous coderef being passed as an argument to the next coderef. This would be fine, until I hit the logical operators (&& || ?:). Suddenly, this introduced the concept of branching into the code.

It feels like it should be a small job, and so I have been loathe to load up a full parser to do it. Having read Higher Order Perl, I have considered using Dominus' HOP::Parser, but I'm wondering if I should bite the bullet and use one of the standard modules like Parse::RecDescent or Parse::Yapp.

I have no experience with any of these modules. What would you recommend to handle what (in my ignorant opinion) should be a light-weight parsing task?

thanks

Clint

UPDATE : I have now posted my parser here: A parser for a limited grammar

Comment on How to parse a limited grammar Select or Download Code

Replies are listed 'Best First'.
Re: How to parse a limited grammar by TedYoung (Deacon) on Jan 10, 2008 at 16:35 UTC
Well, if you are confident that the users of your software are trustworthy, you could simply apply one or more s///'s to the value and eval it: `$value =~ s/\./->/g; ... my $generator = eval "sub { $value }"; die if $@; for (@rows) { ... print $generator->(); ... }` [download] This gives you tremendous power with a minimum amount of work. Though, if you don't completely trust your users, it may give you too much power. You could limit that with Safe, but that has been shown not to be a complete sandbox (it makes it much harder for the user to inadvertently screw up things, but if they have their heart set on it, they can exploit certain flaws). HOP::Parser is great but very, very slow. You only need it if you are parsing streams. Here you have very short strings and, if the above solution won't work for you, you can use traditional lexing semantics to build up a list of tokens. Your grammar is probably simple enough to simply iterator over those tokens and generate valid perl to be eval'ed (or derive the value as you go). `# untested, just pseudo code my @tokens; local $_ = $value; while ( !/\G$/gc ) { # while not at end of string push @tokens, /\G\./gc ? [ 'DOT', '.' ] : /\G(\$\w+)/gc ? [ 'VAR', $1 ] : /\G(\w+)$(.?)$/gc ? [ 'METHOD', $1, [ split /\s,\s/, $2 ] : /\G(\w+)/gc ? [ 'METHOD', $1, [] ] : ... last }` [download] Note that if you want proper nesting of () in method calls, you will need to use a better regex. Continue to add tokenizers for operators (\|, &) etc. When done, just iterate over the list of tokens and generate either the value or if necessary generate trusted perl code and eval it. update:* `# again, untested, just for demonstration my $code = ''; for (@tokens) { my ($type, $source, @params) = @$_; if ($type eq 'VAR') { $code .= $source; } elsif ($type eq 'DOT') { $code .= '->'; } elsif ($type eq 'METHOD') { .... } my $generator = eval "sub { $code }"; die if $@; for (@rows) { ... print $generator->(); ... }` [download] update 2: If you choose either eval option above, you can expose variables to your value code like this: `my $generator = eval "sub { my (\$order, \$order_line) = @_; $code }"; ... $generator->($row->{order}, $row->{order_line});` [download] This exposes variables to your value code without the need of globals and namespaces. Ted Young It is almost impossible for me to read contemporary mathematicians who, instead of saying "Petya washed his hands," write simply: "There is a t1 < 0 such that the image of t1 under the natural mapping t1 -> Petya(t1) belongs to the set of dirty hands, and a t2, t1 < t2 <= 0, such that the image of t2 under the above-mentioned mapping belongs to the complement of the set defined in the preceding sentence." The Russian mathematician V. I. Arnol'd	[reply] [d/l] [select]
Re^2: How to parse a limited grammar by clinton (Priest) on Jan 10, 2008 at 16:55 UTC
Thanks for the reply Ted Yes, I had considered just eval'ing the code - as you surmised, this will be added by trusted users only. Paranoia, an aversion to string evals, and a desire to learn about parsing led me to this post. But evals may yet be the way to go. HOP::Parser is great but very, very slow. That is what I feared - good to know Probably my main reason for looking at a proper parser solution was to be able to handle nested expressions and logic branches (terminology?). I didn't want to waste time going down one road if it was obvious to everybody else that I shouldn't bother. Given what you've said, I'm going to give it a go, and just see where it takes me. thanks again	[reply]
Re^3: How to parse a limited grammar by TedYoung (Deacon) on Jan 10, 2008 at 17:22 UTC
Good. But now I feel bad. I want to qualify my statement about HOP::Lexer being slow (which is a relative statement). It is much faster than some alternatives but is much slower than lexing by hand (as shown above). I found that, in my grammars, lexing by hand was 10 times faster. That is not because HOP::Lexer is bad, but because it has to contend with streams, a feature that makes it much more powerful then lexing by hand, but completely unnecessary for what we are doing here. HOP::Lexer and High Order Perl are good products! Ted Young It is almost impossible for me to read contemporary mathematicians who, instead of saying "Petya washed his hands," write simply: "There is a t1 < 0 such that the image of t1 under the natural mapping t1 -> Petya(t1) belongs to the set of dirty hands, and a t2, t1 < t2 <= 0, such that the image of t2 under the above-mentioned mapping belongs to the complement of the set defined in the preceding sentence." The Russian mathematician V. I. Arnol'd	[reply]
Re: How to parse a limited grammar by hsmyers (Canon) on Jan 11, 2008 at 01:50 UTC
Broadly speaking, there are two preliminary considerations to think about here; roll your own or use a module. The usual arguments about code reuse etc. apply here but there is an important consideration. For the modules listed above, the learning curve is fairly steep particularly for those with limited experience in writing parsers by hand or otherwise. The trouble with writing parsers is that is much easier if you've already written one! Without much analysis, your needs could probably be meet by a simple 'thing at a time' hand written approach. This is how most learn how (to write parsers) in the first place. --hsm "Never try to teach a pig to sing...it wastes your time and it annoys the pig."	[reply]


No such thing as a small change
	PerlMonks

How to parse a limited grammar

The question: how should I parse the value fields?

The question: how should I parse the `value` fields?