Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

Parse:RecDescent grammar help

by pip9ball (Acolyte)
on Oct 25, 2004 at 23:16 UTC ( [id://402387]=perlquestion: print w/replies, xml ) Need Help??

pip9ball has asked for the wisdom of the Perl Monks concerning the following question:

Hello, I was wondering if anyone could help me begin to write some grammer for a conversion project. I basically have a generic language where variables, arrays, and hash’s are defined. I want these variables to be translated to a tool specific datastructure, i.e TCL, scheme, etc. I am assuming I would have to create a different set of grammar(s) for each language I want the variables translated to. However, I am having trouble defining the grammar for parsing the genericLanguange. Could you help me get started by showing me how I would go about parsing the below 3 structures in the GenericLangauge section below. I would appreciate any insight or suggestions.

Thanks in advance,

-Phillip

e.g. for example

__GenericLanguage__ $var = 10 $var1 = “variable1” @var2 = [1,2,3] %var3 = {‘key1’=>’value1’, ‘key2’=>’value2} __Translate-to-SchemeLanguage__ Define var1 = variable1 Define var2 = ‘(1 2 3) __Translate-to-TCLLanguage__ Set var1 variable

Edited by Chady -- added code tags.

Replies are listed 'Best First'.
Re: Parse:RecDescent grammar help
by thekestrel (Friar) on Oct 26, 2004 at 00:50 UTC

    lolo,

    I've been just learning Parse::RecDescent myself so this was a good exercise for me.... This should do the trick
    - You can define plain variables, arrays or hashes
    - takes semi-colons at the end (just to be pretty)
    - the two options on hashelement and array term are for with and without commas (maily for the last element of the array not 'requiring' a comma after it...

    Most of the rest should be pretty self explanitory, you might want to add another type inside 'term' so you can make a varible on of the options i.e. $var = $var2 or the like
    =) (my first answered question...woot!)


    regards Paul
    #!/usr/bin/perl use strict; use warnings; use Data::Dumper (); use Parse::RecDescent (); use strict; use warnings; my $grammar = q{ # --- Tokens --- EOF : /^\Z/ IDENTIFIER : /[A-Za-z]\w*/ LITERAL : /\d+/ VAR : '$' ARRAY : '@' HASH : '%' EQUAL : '=' QUOTE : '"' HASHASSIGN : '=>' # --- Rules --- parse : stmt(s?) EOF { $item[1] } stmt : variable ';' { $item[1] } | array ';' { $item[1] } | hash ';' { $item[1] } | <error> arrayterm : term ',' { [ @item[0, 1] ] } | term { [ @item[0, 1] ] } array : ARRAY IDENTIFIER EQUAL arrayterm(s?) { [ $item[2, 4] ] } hashelement : IDENTIFIER HASHASSIGN term ',' { [ @item[0,1] ] } | IDENTIFIER HASHASSIGN term { [ @item[0,1] ] } hash : HASH IDENTIFIER EQUAL '{' hashelement(s?) '}' { [ @item [0, 2, 5] ] } variable : VAR IDENTIFIER EQUAL term { [ @item[0, 2, 4] ] } term : QUOTE IDENTIFIER QUOTE { [ 'identifier', $item[2 +] ] } | LITERAL { [ 'literal', $item[1] ] } }; $::RD_HINT = 1; my $parser = Parse::RecDescent->new($grammar); die("Bad grammar.\n") unless defined($parser); my $text = q { $dog = 5; $dog = "fluffy"; @arr = 5,"canary",7; %stuff = { animal => "dog", age => 5, name => "fluffy" }; }; my $result = $parser->parse(\$text); die("Bad text.\n") unless (defined($result)); print Data::Dumper::Dumper($result);
Re: Parse:RecDescent grammar help
by tachyon (Chancellor) on Oct 26, 2004 at 01:15 UTC

    I would imagine you would need something like this. Each target language would have its own set of dispatch methods. The major issue I see with your generic syntax is that you appear to be using newlines as the statement terminators. Update thekestrels excellent post made me feel obligated to lift my game!

    use Parse::RecDescent; use Data::Dumper; my $data =<<'DATA'; $string = 'Hello World' $integer = 1234 $decimal = 1.234 @ary = [ 'foo', "bar", 123, 1.23 ] %hash = { foo => 'bar', "baz" => 123 } DATA my $grammar =<<'GRAMMAR'; SV : '$' HV : '%' AV : '@' VARNAME : /\w+/ QV : /"/ /[^"]*/ /"/ { [ 'char', $item[2] ] } | /'/ /[^']*/ /'/ { [ 'char', $item[2] ] } INT : /\d+/ { [ 'int', $item[1] ] } FLOAT : /\d*\.\d+/ { [ 'float', $item[1] ] } SCALAR : QV | FLOAT | INT LIST : SCALAR /,/ { $item[1] } | SCALAR { $item[1] } HASHBIT : VARNAME /=>/ SCALAR { [ @item[1,3] ] } | QV /=>/ SCALAR { [ @item[1,3] ] } HASHLIST : HASHBIT /,/ { $item[1] } | HASHBIT { $item[1] } EQUALS : '=' start : statement(s) statement : assign_scalar | assign_array | assign_hash assign_scalar : SV VARNAME EQUALS SCALAR { [ @item[0,2,4] ] } assign_array : AV VARNAME EQUALS '[' LIST(s) ']' { [ @item[0,2,5] ] } assign_hash : HV VARNAME EQUALS '{' HASHLIST(s) '}' { [ @item[0,2,5] ] } GRAMMAR my $parser = Parse::RecDescent->new($grammar); my $lang = Language::C->new(); my $parse_tree = $parser->start($data); print Dumper $parse_tree; for my $item( @$parse_tree ) { my $method = $item->[0]; $lang->$method(@$item); } package Language::C; sub new { bless {}, shift } sub assign_scalar { my ( $self, $function, $varname, $value ) = @_; if ( $value->[0] eq 'int' ) { print "int $varname = $value->[1];\n"; } elsif ( $value->[0] eq 'float' ) { print "double $varname = $value->[1];\n"; } else { print qq!char $varname!.qq![] = "$value->[1]";\n!; } } sub assign_array { print Data::Dumper::Dumper \@_ } sub assign_hash { print Data::Dumper::Dumper \@_ }

    cheers

    tachyon

•Re: Parse:RecDescent grammar help
by merlyn (Sage) on Oct 26, 2004 at 14:10 UTC
Re: Parse:RecDescent grammar help
by pip9ball (Acolyte) on Oct 26, 2004 at 04:24 UTC
    Thank you both for the quick replies. Ofcourse now I have some more questions.

    1.) Can you explain the following code? How does$lang->$method(@$item); know which sub-routine to call?

    2.) From looking at the output of Data::Dumper for the array and hash subroutines, it looks like this is stored in arrays or nested arrays. Im having difficulty understanding how to extract this data. It's nice that Data::dumper is smart enough to print the structure but how it does it Im not sure.

    3.) Can this grammar be modified to allow the following entries?

    $a = "hello"

    $b = "world"

    @array = $a, $b

    %hash = key=>@array

    etc...

    Once again, thank you for your help!

    -Phillip

      1) The first element of @item (ie $item[0]) is the PRD rule name. You will notice that in a lot of cases I am grabbing item 1+ and ignoring item 0 because we want the matched data, not the name of the rule. In the three 'method' rules we select 0,2,4/5 from @item which gives us the rule_name, var_name, assign_data. The rule name is the same as the method name. Get it? The rule we match also tells us what function to call to deal with the data.

      2) The data is stored as array_refs or array_refs of array refs. See perlreftut. I have given you an example of how to access a typical value. The parse tree is an array ref, that hold more array refs, which probably hold yet more array refs. The first level of array refs if what we iterate over. We assign that to $item and this is the result of one successful rule parse. @$ syntax gets us an array from our array ref. $ref->[0] gets us item 0, rather than the whole list.

      3) You can modify the grammar to your hearts content. The more complexity you add the more problems you are going to find. You have what looks a hell of a lot like Perl5 syntax and Perl is a bitch to parse. Why not just use a real language and let its parser generate a parse tree for you? I am not sure you have considered just how complex a project what you propose is.

      cheers

      tachyon

        Tachyon, Thanks for the reply...I think I understand my questions now...I had no idea the rule name was passed back as item[0].

        Your right about not considering how complex this project is...the generic languge can be anything, not necessarily what I proposed. Im not quite sure what you mean when you say why not use a real language? Do you have any examples?

        I was trying to come up with a generic language that allows the basic data structures (element, array, hash) that will eventually get parsed and translated to a tool specific language...this way variables only need to be specified in one place and converted if needed. If you have any suggestions, please don't hesitate to share them.

        Once again, I really appreciate all of your knowledge and help!!

        regards,

        Phillip


      Hi Phillip,

      Firstly, as Tachyon says these rules can get really funky really quickly, especialy when you want to try and model nested things of conditional instructions (as I'm finding out for with my tinkering).
      That aside I've remodelled my rules(using my last example code) to accomodate the type of entries you wanted. First here are some definitions.... in my little language these are my types...

      5 # Any number is a 'literal'
      "fluffy" # encased text I call an 'identifier'
      $cuddles # this is a 'variable'
      @animals # this is an 'array'
      %stuff # this is a hash


      Now if you follow the rules you'll see that you can pretty much have any cobination of these...so you can do seksi things like this.. (put this in the text section from before as an example)

      %stuff = { animal => @pets, age => 5, name => "fluffy", colour => $col };


      (Just as a side note the $a = "hello" and $b = "world" should have already worked with my existing program, this bit is so you can embed 'variable's and 'array's in things)


      Replace all the bits in my 'rules' section from before with the following and that should spice things up...
      # --- Rules --- parse : stmt(s?) EOF { $item[1] } stmt : variable ';' { $item[1] } | array ';' { $item[1] } | hash ';' { $item[1] } | <error> arrayelement : term ',' { [ @item[0, 1] ] } | term { [ @item[0, 1] ] } arrayname : ARRAY IDENTIFIER { [ 'array', $item[2] ] } array : arrayname EQUAL arrayelement(s?) { [ @item[2, 4] ] } hashelement : IDENTIFIER HASHASSIGN term ',' { [ @item[0,1 +,3] ] } | IDENTIFIER HASHASSIGN term { [ @item[0,1,3] +] } hash : HASH IDENTIFIER EQUAL '{' hashelement(s?) '} +' { [ @item [0, 2, 5] ] } variablename : VAR IDENTIFIER { [ 'variable', $item[2] ] } variable : variablename EQUAL term { [ @item[0, 2, 4] ] } term : QUOTE IDENTIFIER QUOTE { [ 'identifier', $it +em[2] ] } | LITERAL { [ 'literal', $item[1] ] } | arrayname { $item[1] } | variablename { $item[1] }


      ....and the output for the example I gave you...
      $VAR1 = [ [ 'hash', 'stuff', [ [ 'hashelement', 'animal', [ 'array', 'pets' ] ], [ 'hashelement', 'age', [ 'literal', '5' ] ], [ 'hashelement', 'name', [ 'identifier', 'fluffy' ] ], [ 'hashelement', 'colour', [ 'variable', 'col' ] ] ] ] ];

      Have phun...
      Regards Paul
        Paul,

        Thank-you for the reply...I will try these changes out after Jury Duty :-(

        From the examples yourself and tachyon provided, I realize that the output of what's parsed can become very complex and get nasty real quick. Perhaps I'll need to set some limitations on how many nested statments are allowed...otherwise retreiving this data is going to be a nightmare.

        Thanks again!

        -Phillip

        Paul, What version of perl are you using to produce this output. I am getting errors with the following input data.

        my $text = q {@dogs = ["dollar","mack"]; %myHash = {animals => @dogs, age = 5, names=> "fluffy"}; }; my $result = $parser->parse(\$text); OUTPUT ERROR (line -1): Invalid stmt: Was expecting ';' but found "["dollar","mack"];" instead Bad text.

        Thanks, -Phillip

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://402387]
Approved by Limbic~Region
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having a coffee break in the Monastery: (2)
As of 2024-04-20 09:52 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found