Parse:RecDescent grammar help

pip9ball has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Parse:RecDescent grammar help by thekestrel (Friar) on Oct 26, 2004 at 00:50 UTC
lolo, I've been just learning Parse::RecDescent myself so this was a good exercise for me.... This should do the trick - You can define plain variables, arrays or hashes - takes semi-colons at the end (just to be pretty) - the two options on hashelement and array term are for with and without commas (maily for the last element of the array not 'requiring' a comma after it... Most of the rest should be pretty self explanitory, you might want to add another type inside 'term' so you can make a varible on of the options i.e. $var = $var2 or the like =) (my first answered question...woot!) regards Paul #!/usr/bin/perl use strict; use warnings; use Data::Dumper (); use Parse::RecDescent (); use strict; use warnings; my $grammar = q{ # --- Tokens --- EOF : /^\Z/ IDENTIFIER : /[A-Za-z]\w*/ LITERAL : /\d+/ VAR : '$' ARRAY : '@' HASH : '%' EQUAL : '=' QUOTE : '"' HASHASSIGN : '=>' # --- Rules --- parse : stmt(s?) EOF { $item[1] } stmt : variable ';' { $item[1] } \| array ';' { $item[1] } \| hash ';' { $item[1] } \| <error> arrayterm : term ',' { [ @item[0, 1] ] } \| term { [ @item[0, 1] ] } array : ARRAY IDENTIFIER EQUAL arrayterm(s?) { [ $item[2, 4] ] } hashelement : IDENTIFIER HASHASSIGN term ',' { [ @item[0,1] ] } \| IDENTIFIER HASHASSIGN term { [ @item[0,1] ] } hash : HASH IDENTIFIER EQUAL '{' hashelement(s?) '}' { [ @item [0, 2, 5] ] } variable : VAR IDENTIFIER EQUAL term { [ @item[0, 2, 4] ] } term : QUOTE IDENTIFIER QUOTE { [ 'identifier', $item[2 +] ] } \| LITERAL { [ 'literal', $item[1] ] } }; $::RD_HINT = 1; my $parser = Parse::RecDescent->new($grammar); die("Bad grammar.\n") unless defined($parser); my $text = q { $dog = 5; $dog = "fluffy"; @arr = 5,"canary",7; %stuff = { animal => "dog", age => 5, name => "fluffy" }; }; my $result = $parser->parse(\$text); die("Bad text.\n") unless (defined($result)); print Data::Dumper::Dumper($result); [download]	[reply] [d/l]
Re: Parse:RecDescent grammar help by tachyon (Chancellor) on Oct 26, 2004 at 01:15 UTC
I would imagine you would need something like this. Each target language would have its own set of dispatch methods. The major issue I see with your generic syntax is that you appear to be using newlines as the statement terminators. Update thekestrels excellent post made me feel obligated to lift my game! use Parse::RecDescent; use Data::Dumper; my $data =<<'DATA'; $string = 'Hello World' $integer = 1234 $decimal = 1.234 @ary = [ 'foo', "bar", 123, 1.23 ] %hash = { foo => 'bar', "baz" => 123 } DATA my $grammar =<<'GRAMMAR'; SV : '$' HV : '%' AV : '@' VARNAME : /\w+/ QV : /"/ /[^"]/ /"/ { [ 'char', $item[2] ] } \| /'/ /[^']/ /'/ { [ 'char', $item[2] ] } INT : /\d+/ { [ 'int', $item[1] ] } FLOAT : /\d*\.\d+/ { [ 'float', $item[1] ] } SCALAR : QV \| FLOAT \| INT LIST : SCALAR /,/ { $item[1] } \| SCALAR { $item[1] } HASHBIT : VARNAME /=>/ SCALAR { [ @item[1,3] ] } \| QV /=>/ SCALAR { [ @item[1,3] ] } HASHLIST : HASHBIT /,/ { $item[1] } \| HASHBIT { $item[1] } EQUALS : '=' start : statement(s) statement : assign_scalar \| assign_array \| assign_hash assign_scalar : SV VARNAME EQUALS SCALAR { [ @item[0,2,4] ] } assign_array : AV VARNAME EQUALS '[' LIST(s) ']' { [ @item[0,2,5] ] } assign_hash : HV VARNAME EQUALS '{' HASHLIST(s) '}' { [ @item[0,2,5] ] } GRAMMAR my $parser = Parse::RecDescent->new($grammar); my $lang = Language::C->new(); my $parse_tree = $parser->start($data); print Dumper $parse_tree; for my $item( @$parse_tree ) { my $method = $item->[0]; $lang->$method(@$item); } package Language::C; sub new { bless {}, shift } sub assign_scalar { my ( $self, $function, $varname, $value ) = @_; if ( $value->[0] eq 'int' ) { print "int $varname = $value->[1];\n"; } elsif ( $value->[0] eq 'float' ) { print "double $varname = $value->[1];\n"; } else { print qq!char $varname!.qq![] = "$value->[1]";\n!; } } sub assign_array { print Data::Dumper::Dumper \@_ } sub assign_hash { print Data::Dumper::Dumper \@_ } [download] cheers tachyon	[reply] [d/l]
•Re: Parse:RecDescent grammar help by merlyn (Sage) on Oct 26, 2004 at 14:10 UTC
You might get some help from the PRD grammar I wrote to parse Data::Dumper's output. -- Randal L. Schwartz, Perl hacker Be sure to read my standard disclaimer if this is a reply.	[reply]
Re: Parse:RecDescent grammar help by pip9ball (Acolyte) on Oct 26, 2004 at 04:24 UTC
Thank you both for the quick replies. Ofcourse now I have some more questions. 1.) Can you explain the following code? How does$lang->$method(@$item); know which sub-routine to call? 2.) From looking at the output of Data::Dumper for the array and hash subroutines, it looks like this is stored in arrays or nested arrays. Im having difficulty understanding how to extract this data. It's nice that Data::dumper is smart enough to print the structure but how it does it Im not sure. 3.) Can this grammar be modified to allow the following entries? $a = "hello" $b = "world" @array = $a, $b %hash = key=>@array etc... Once again, thank you for your help! -Phillip	[reply]
Re^2: Parse:RecDescent grammar help by tachyon (Chancellor) on Oct 26, 2004 at 06:42 UTC
1) The first element of @item (ie `$item[0]`) is the PRD rule name. You will notice that in a lot of cases I am grabbing item 1+ and ignoring item 0 because we want the matched data, not the name of the rule. In the three 'method' rules we select 0,2,4/5 from @item which gives us the rule_name, var_name, assign_data. The rule name is the same as the method name. Get it? The rule we match also tells us what function to call to deal with the data. 2) The data is stored as array_refs or array_refs of array refs. See perlreftut. I have given you an example of how to access a typical value. The parse tree is an array ref, that hold more array refs, which probably hold yet more array refs. The first level of array refs if what we iterate over. We assign that to $item and this is the result of one successful rule parse. @$ syntax gets us an array from our array ref. $ref->[0] gets us item 0, rather than the whole list. 3) You can modify the grammar to your hearts content. The more complexity you add the more problems you are going to find. You have what looks a hell of a lot like Perl5 syntax and Perl is a bitch to parse. Why not just use a real language and let its parser generate a parse tree for you? I am not sure you have considered just how complex a project what you propose is. cheers tachyon	[reply] [d/l]
Re^3: Parse:RecDescent grammar help by pip9ball (Acolyte) on Oct 26, 2004 at 13:34 UTC
Tachyon, Thanks for the reply...I think I understand my questions now...I had no idea the rule name was passed back as item[0]. Your right about not considering how complex this project is...the generic languge can be anything, not necessarily what I proposed. Im not quite sure what you mean when you say why not use a real language? Do you have any examples? I was trying to come up with a generic language that allows the basic data structures (element, array, hash) that will eventually get parsed and translated to a tool specific language...this way variables only need to be specified in one place and converted if needed. If you have any suggestions, please don't hesitate to share them. Once again, I really appreciate all of your knowledge and help!! regards, Phillip	[reply]
Re^2: Parse:RecDescent grammar help by thekestrel (Friar) on Oct 26, 2004 at 10:28 UTC
Hi Phillip, Firstly, as Tachyon says these rules can get really funky really quickly, especialy when you want to try and model nested things of conditional instructions (as I'm finding out for with my tinkering). That aside I've remodelled my rules(using my last example code) to accomodate the type of entries you wanted. First here are some definitions.... in my little language these are my types... 5 # Any number is a 'literal' "fluffy" # encased text I call an 'identifier' $cuddles # this is a 'variable' @animals # this is an 'array' %stuff # this is a hash Now if you follow the rules you'll see that you can pretty much have any cobination of these...so you can do seksi things like this.. (put this in the text section from before as an example) `%stuff = { animal => @pets, age => 5, name => "fluffy", colour => $col };` (Just as a side note the $a = "hello" and $b = "world" should have already worked with my existing program, this bit is so you can embed 'variable's and 'array's in things) Replace all the bits in my 'rules' section from before with the following and that should spice things up... # --- Rules --- parse : stmt(s?) EOF { $item[1] } stmt : variable ';' { $item[1] } \| array ';' { $item[1] } \| hash ';' { $item[1] } \| <error> arrayelement : term ',' { [ @item[0, 1] ] } \| term { [ @item[0, 1] ] } arrayname : ARRAY IDENTIFIER { [ 'array', $item[2] ] } array : arrayname EQUAL arrayelement(s?) { [ @item[2, 4] ] } hashelement : IDENTIFIER HASHASSIGN term ',' { [ @item[0,1 +,3] ] } \| IDENTIFIER HASHASSIGN term { [ @item[0,1,3] +] } hash : HASH IDENTIFIER EQUAL '{' hashelement(s?) '} +' { [ @item [0, 2, 5] ] } variablename : VAR IDENTIFIER { [ 'variable', $item[2] ] } variable : variablename EQUAL term { [ @item[0, 2, 4] ] } term : QUOTE IDENTIFIER QUOTE { [ 'identifier', $it +em[2] ] } \| LITERAL { [ 'literal', $item[1] ] } \| arrayname { $item[1] } \| variablename { $item[1] } [download] ....and the output for the example I gave you... `$VAR1 = [ [ 'hash', 'stuff', [ [ 'hashelement', 'animal', [ 'array', 'pets' ] ], [ 'hashelement', 'age', [ 'literal', '5' ] ], [ 'hashelement', 'name', [ 'identifier', 'fluffy' ] ], [ 'hashelement', 'colour', [ 'variable', 'col' ] ] ] ] ];` [download] Have phun... Regards Paul	[reply] [d/l] [select]
Re^3: Parse:RecDescent grammar help by pip9ball (Acolyte) on Oct 26, 2004 at 13:39 UTC
Paul, Thank-you for the reply...I will try these changes out after Jury Duty :-( From the examples yourself and tachyon provided, I realize that the output of what's parsed can become very complex and get nasty real quick. Perhaps I'll need to set some limitations on how many nested statments are allowed...otherwise retreiving this data is going to be a nightmare. Thanks again! -Phillip	[reply]
Re^4: Parse:RecDescent grammar help by thekestrel (Friar) on Oct 27, 2004 at 03:49 UTC
Re^3: Parse:RecDescent grammar help by pip9ball (Acolyte) on Oct 27, 2004 at 02:11 UTC
Paul, What version of perl are you using to produce this output. I am getting errors with the following input data. `my $text = q {@dogs = ["dollar","mack"]; %myHash = {animals => @dogs, age = 5, names=> "fluffy"}; }; my $result = $parser->parse(\$text); OUTPUT ERROR (line -1): Invalid stmt: Was expecting ';' but found "["dollar","mack"];" instead Bad text.` [download] Thanks, -Phillip	[reply] [d/l]
Re^4: Parse:RecDescent grammar help by thekestrel (Friar) on Oct 27, 2004 at 03:41 UTC


P is for Practical
	PerlMonks