Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Dynamic expansion of tokens with Parse::RecDescent

by jaldhar (Vicar)
on Aug 18, 2004 at 20:31 UTC ( #384098=perlquestion: print w/ replies, xml ) Need Help??
jaldhar has asked for the wisdom of the Perl Monks concerning the following question:

assume I had a Parse::RecDescent grammar that looked in part like this:

order: quantity color item quantity: /\d+/ color: # what should this be? item: 'pen' | 'pencil'

now assume in one script that uses this grammar, there was:

my @colors = ('red', 'green', 'blue');

while in another script there was:

my @colors = ('cyan', 'magenta', 'burnt umber');

How would I write the definition for color in the grammer so it would match only valid @colors? There may be many other scripts with different color combinations, and as the second example shows, they can be arbitrary strings with embedded spaces.

I suspect the answer involves having a function in each script like this:

sub is_valid_color { my ($color) = @_; foreach (@colors) { return 1 if $color eq $_; } return undef; }

but for some reason I just can't make the connection.

--
જલધર

Comment on Dynamic expansion of tokens with Parse::RecDescent
Select or Download Code
Re: Dynamic expansion of tokens with Parse::RecDescent
by revdiablo (Prior) on Aug 18, 2004 at 20:54 UTC

    I don't have any experience with Parse::RecDescent, but couldn't you just use the array as the basis to generate a grammar rule? For example:

    my @colors = qw(red green blue); my $rule = "color: " . join " | ", map "'$_'", @colors;
Re: Dynamic expansion of tokens with Parse::RecDescent
by Solo (Deacon) on Aug 18, 2004 at 21:04 UTC
    You might try the Extend method like so:

    use Parse::RecDescent; my @colors = ('periwinkle','indigo'); my $grammar = q[ order: quantity color item quantity: /\d+/ color: item: 'pen' | 'pencil' ]; my $parser = new Parse::RecDescent ($grammar); $parser->Extend("color: '" . join(q['|'],@colors) . "'" );

    Or cram it in the grammar...

    my $grammar = q[ order: quantity color item quantity: /\d+/ color: '] . join(q['|'],@colors) . q[' item: 'pen' | 'pencil' ];

    Update: revdiablo beat me to it... note to self: type less.

    --Solo

    --
    You said you wanted to be around when I made a mistake; well, this could be it, sweetheart.
Re: Dynamic expansion of tokens with Parse::RecDescent
by amw1 (Friar) on Aug 18, 2004 at 21:13 UTC
    This will produce a spiffy data structure for you. I'm assuming that you can get the array into is_valid_color cleanly.
    use Parse::RecDescent; use strict; use Data::Dumper; sub is_valid_color { my $color_to_test = shift(); return $color_to_test if ($color_to_test =~ /red|green|blue/); return undef; } my $grammar = << 'GRAMMAR'; start : order(s) order: quantity color product { $return = { quant => $item{'quantity'}, color => $item{'color'}, item => $item{'product'}, }; } quantity: /\d+/ color: /\w+/ { $return = main::is_valid_color($item[1]); } | <error> product: 'pencil' | 'pen' GRAMMAR my $text; $text = "1 red pencil\n"; $text .= "2 blue pencil\n"; $text .= "4 green pen\n"; my $parser = new Parse::RecDescent($grammar) || die "Bad Grammar"; my $product_list = $parser->start($text) || die "Bad Syntax"; print Dumper($product_list);
    output is
    $VAR1 = [ { 'color' => 'red', 'item' => 'pencil', 'quant' => '1' }, { 'color' => 'blue', 'item' => 'pencil', 'quant' => '2' }, { 'color' => 'green', 'item' => 'pen', 'quant' => '4' } ];
    I changed item to product to avoid confusion with the multiple contexts of item. Also, you must put pencil before pen as Recdescent will find the first match not the longest match. /pen/ will match both pen and pencil so put pencil first.

      Can you try running that $text .= "5 purple pen\n"; added? I don't think you'll get an error (although the colour will be 'undef' in the result). I can't verify this for myself since I don't have this module.

      Update: Nevermind, I see something in the docs I didn't see before, concernign the returning values of actions.. I think the | <error> is unnecessary, btw. That's what confused me.

        It will die with a bad syntax error I believe. There is more error handling that can be done I just didn't do it.

        I think the error tag will give you a little more detail on where the error occured etc. I didn't have a ton of time to play around with it.

Re: Dynamic expansion of tokens with Parse::RecDescent
by ikegami (Pope) on Aug 18, 2004 at 21:39 UTC

    The real problem is deciding where the colour ends, since you want two word colours. (You mentioned 'burnt umber' as possibly being a valid colour.) What follows is a grammar that allows colours to be quoted. It's good practive to seperate token rules from the grammar rules.

    -- grammar: # grammar: order: quantity color item quantity: number color: id | qstring item: id # tokens: number: /\d+/ id: /[a-zA-Z][-_a-zA-Z0-9]*/ qstring: /"(?:\\["\\]|[^"\\])*"/ { $return = unquote($item[1]); } | /'(?:\\['\\]|[^'\\])*'/ { $return = unquote($item[1]); } -- helper (non-grammar) code: sub unquote { local $_ = $_[0]; s/^['"]//; s/['"]$//; s/\\(.)/\1/g; return $_; } --

    Checking whether the colour is legit or not is usually done after the parsing is done, or at least, when the tokenizing is done. That allows you to find multiple errors while running the compiler only once. For example, if you've read Larry Walls's Apocolypses, you may have noticed he makes a few allusions to items that shouldn't be verified by the parser for just this reason. What follows is a piece of code that introduces validation of colours:

    -- helper (non-grammar) code: my @COLORS = ( 'red' 'green' 'blue' 'cyan' 'magenta' 'burnt umber' ); my %COLORS; { my $idx = 0; %COLORS = map { $_ => $idx++ } @COLORS; } sub unquote { local $_ = $_[0]; s/^['"]//; s/['"]$//; s/\\(.)/\1/g; return $_; } my $error = 0; ... parse ... die("Validation errors were found.\n") if ($error); -- grammar: # grammar: order: quantity color item quantity: number color: id | qstring { $return = $COLORS{$item[1]}; # Uncomment the following if you want # to have multiple errors listed. # unless (defined($return)) # { # warn("Line $thisline: Invalid color.\n"); # $error = 1; # $return = undef; # } # 1; } item: id { ($return) = $item[1] =~ /^(pen|pencil)$/; # Uncomment the following if you want # to have multiple errors listed. # unless (defined($return)) # { # warn("Line $thisline: Invalid item.\n"); # $error = 1; # $return = undef; # } # 1; } # tokens: number: /\d+/ id: /[a-zA-Z][-_a-zA-Z0-9]*/ qstring: /"(?:\\["\\]|[^"\\])*"/ { $return = unquote($item[1]); } | /'(?:\\['\\]|[^'\\])*'/ { $return = unquote($item[1]); } --
Re: Dynamic expansion of tokens with Parse::RecDescent
by jaldhar (Vicar) on Aug 19, 2004 at 20:17 UTC

    Thanks to all that replied. amw1's code was helpful but didn't deal with the possibility of multiple-word color names. I like how he got the completed order into a data structure though.

    ikegami's advice to seperate tokenizing and parsing was good and I will probably have to go that way soon to deal with things like "burnt umber", "lt. blk" etc. It seems it will be easier and more modular. Unfortunately I have a lot of input already in a fixed format so I can't quote the color names. I'll see if it can be changed for future input.

    In the end I went with a variation on solo and revdiablo's suggestions:

    color: /(??{join ("|", @colors)})/

    --
    જલધર

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://384098]
Approved by bart
Front-paged by broquaint
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (8)
As of 2014-10-25 17:54 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    For retirement, I am banking on:










    Results (147 votes), past polls