Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

Comment on

( #3333=superdoc: print w/ replies, xml ) Need Help??

Hello fellow monks,

While there is already Parser::Combinators on CPAN, I write a simple parser combinator demo whose idea is mainly from Hutton92 (Graham Hutton, Higher-Order Functions for Parsing, Journal of Functional Programming 2(3):323343, July 1992). I try to make it look like boost.spirit - especially the sequence combinator ('then'), the alternative combinator ('alt') and the semantic action combinator ('using').

Here is the code of combinators:

{ package wraith; { package inner_lazy; sub TIESCALAR { my ($class, $val) = @_; bless $_[1], $class } sub FETCH { my ($self) = @_; $self->() } } use overload '>>' => "then_impl", '|' => "alt_impl", '**' => "using_impl"; sub deref { my @args = @_; for my $elt (@args) { if (ref($elt) eq "wraith_rule") { $elt = $$elt; } } @args } our $concat = sub { my @list_of_lists = @_; my @list; for my $elt (@list_of_lists) { push @list, $_ for @$elt; } \@list }; our $succeed = bless sub { my $v = $_[0]; bless sub { my $u = (ref($v) eq "ARRAY") ? $v : [ $v ]; [ [ $u, $_[0] ] ] } }; our $fail = bless sub { [] }; our $satisfy = bless sub { my ($p, $m) = @_; $m = sub { $_[0] =~ /(.)(.*)/s } if not $m; bless sub { if (my ($x, $xs) = $m->($_[0])) { if ($p->($x)) { return $succeed->($x)->($xs); } else { return $fail->($xs); } } else { return $fail->( [] ); } } }; our $literal = bless sub { my $y = $_[0]; $satisfy->( sub { $y eq $_[0] } ) }; our $literals = bless sub { my $y = $_[0]; $satisfy->( sub { index($y, $_[0]) != -1 } ) }; our $token = bless sub { my ($tok, $skip) = @_; $skip = '\s*' if not $skip; $satisfy->( sub { 1 }, sub { $_[0] =~ /^$skip($tok)(.*)/s } ) }; sub alt_impl { my ($p1_, $p2_, $discard) = @_; bless sub { my ($p1, $p2) = deref($p1_, $p2_); my $inp = $_[0]; $concat->($p1->($inp), $p2->($inp)) } } our $alt = bless \&alt_impl; sub then_impl { my $arglist = \@_; bless sub { my ($p1) = deref($arglist->[0]); my $inp = $_[0]; my $reslist1 = $p1->($inp); my $finlist = []; for my $respair (@$reslist1) { my ($p2) = deref($arglist->[1]); my $reslist2 = $p2->($respair->[1]); for my $finpair (@$reslist2) { push @$finlist, [ $concat->($respair->[0], $finpai +r->[0]), $finpair->[1] ]; } } $finlist } } our $then = bless \&then_impl; sub using_impl { my ($p_, $f, $discard) = @_; bless sub { my ($p) = deref($p_); my $inp = $_[0]; my $reslist = $p->($inp); my $finlist = []; for my $respair (@$reslist) { push @$finlist, [ $f->($respair->[0]), $respair->[1] ] +; } $finlist } } our $using = bless \&using_impl; sub many_impl { my $p = $_[0]; my $f; tie $f, "inner_lazy", sub { many_impl($p) }; $alt->($then->($p, $f), $succeed->( [] )) } our $many = bless \&many_impl; } { package wraith_rule; our @ISA = qw ( wraith ); sub makerule { bless $_[0] } sub makerules { my ($class, @args) = @_; for my $elt (@args) { $elt = makerule($elt); } @args } }

wraith_rule seems ugly. But I have to make it a subclass of the combinator class wraith for convenience of writing rules further. The idea of Hutton92 is not monadic nor memorized. Perhaps I will try those techniques later.

Below is a demo for using the combinators. It demonstrates a simply modified untyped lambda calculus language, which is later translated into perl snippets. It's good to see the great similarity between Perl and lambda calculus - except the eval order :D

# test case: a lambda-calculus-to-perl translator my %expr_root = ( "kind" => "list", "defn" => [], "term" => [] ); my $rootref = \%expr_root; my ($formlist, $form, $term, $varlist, $appterm, $aterm); wraith_rule->makerules(\$formlist, \$form, \$term, \$varlist, \$appter +m, \$aterm); $formlist = $wraith::many->(\$form); $form = ( (\$term >> $wraith::token->(';')) ** sub { [ { "kind" => "term", "body" => $_[0]->[0] } ] } ) | ( ($wraith::token->('[A-Za-z_]+') >> $wraith::token->('=') >> +\$term >> $wraith::token->(';')) ** sub { [ { "kind" => "defn", "name" => $_[0]->[0], "body" => +$_[0]->[2] } ] } ); $term = ( (\$appterm) ** sub { [ { "kind" => "appl", "body" => $_[0]-> +[0] } ] } ) | ( ($wraith::token->('\\\\') >> \$varlist >> $wraith::token->(' +\.') >> \$term) ** sub { [ { "kind" => "abst", "para" => $_[0]->[1], "body" => +$_[0]->[3] } ] } ); $varlist = ($wraith::many->($wraith::token->('[A-Za-z_]+'))) ** sub { +[ $_[0] ] }; $appterm = ($wraith::many->(\$aterm)) ** sub { [ $_[0] ] }; $aterm = ( ($wraith::token->('\(') >> \$term >> $wraith::token->('\)') +) ** sub { [ { "kind" => "applterm", "body" => $_[0]->[1] } ] } ) | ( ($wraith::token->('[A-Za-z_]+')) ** sub { [ { "kind" => "ap +plvar", "val" => $_[0]->[0] } ] } ); sub emitabst; sub emitappl; sub emitapplterm; sub emitapplvar; sub emitterm; sub emitdefn; my %emitmethods = ( "term" => \&emitterm, "defn" => \&emitdefn, "appl" => \&emitappl, "abst" => \&emitabst, "applterm" => \&emitapplterm, "applvar" => \&emitapplvar ); sub emitabst { my $abstref = $_[0]; my $params = $abstref->{"para"}; my $nparams = @$params; my $c_param = shift @$params; my $codefrag = undef; if ($nparams) { $codefrag .= "sub { my \$$c_param = \$_[0]; "; } if (@$params) { $codefrag .= emitabst($abstref); } else { $codefrag .= $emitmethods{$abstref->{"body"}->{"kind"}}->($abs +tref->{"body"}); } $codefrag.' }' } sub emitappl { my $applref = $_[0]; my $oplist = $applref->{"body"}; my $codefrag = undef; my $addparen = 0; while (@$oplist) { my $opitr = shift @$oplist; if ($addparen) { $codefrag .= '( '; } $codefrag .= $emitmethods{$opitr->{"kind"}}->($opitr); if ($addparen) { $codefrag .= ' )'; } if (@$oplist) { $codefrag .= '->'; $addparen = 1; } } $codefrag } sub emitapplterm { my $atermref = $_[0]; $emitmethods{$atermref->{"body"}->{"kind"}}->($atermref->{"body"}) } sub emitapplvar { my $varref = $_[0]; '$'. $varref->{"val"} } sub emitterm { my $termref = $_[0]; $emitmethods{$termref->{"body"}->{"kind"}}->($termref->{"body"}) } sub emitdefn { my $defnref = $_[0]; 'my $' . $defnref->{"name"} .' = '. $emitmethods{$defnref->{"body" +}->{"kind"}}->($defnref->{"body"}) .';' } my $res = $formlist->('true = \x y.x; x x y; Y = \f.(\x y.f (x x)) (\x + y. f (x x));'); for my $itr (@{$res->[0]->[0]}) { if ($itr->{"kind"} eq "term") { push $expr_root{"term"}, $itr; } else { push $expr_root{"defn"}, $itr; } } my ($defnlist, $termlist) = ($rootref->{"defn"}, $rootref->{"term"}); print "# defnlist: \n"; for my $defnitr (@$defnlist) { print emitdefn($defnitr); print "\n"; } print "# termlist: \n"; for my $termitr (@$termlist) { print emitterm($termitr); print "\n"; }

The complete code is available at https://github.com/Akvelog/a-little-five/blob/master/_5wraith.pl

Any advise is welcome. Feel free to correct any of my mistakes (especially usage of English for I am not a native speaker).

-------- update --------

For people not familiar with parser combinators:

Parser combinators are functions which take a string and return a intermediate analysis result -- AST, tuple, calculation result, .etc. Their parsing ability is slightly better than LL(1) parsers -- they are able to process products with common prefix but still die on left-recursive grammars.

The usage of combinators is quite simple: just rewrite the EBNF, replacing each symbol with a combinator (basic combinator or a compsite combinator generated by applying operators 'then', 'alt', 'many', .etc). The combinator corresponding to the start symbol of the grammar is the expected parser. Apply a string to the parser and there will be a list of analysis results:

( (result_1, input_left_1), (result_2, input_left_2), ... (result_n, input_left_n) )

where input_left_i is the unprocessed input string. Obiviously, the results with empty unprocessed input strings are correct. If the language is not ambiguous, there will be only one correct result.

Two of the well-known parser combinator implementations are boost.spirit (c++) and parsec (haskell). Readers may found some interesting info or tutorials in their documentation.

People not familiar with lambda calculus may refer to http://perl.plover.com/lambda/ . This article demonstrates how Perl itself contains the (untyped) lambda calculus with code demo, essays and slides. Readers may found it interesting.

-------- update Sep 17 2013 --------

I made it a CPAN module Wraith. All suggestions and criticisms are welcome.


In reply to Simple Parser Combinator Implementation by withering

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • Outside of code tags, you may need to use entities for some characters:
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?
    Username:
    Password:

    What's my password?
    Create A New User
    Chatterbox?
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others exploiting the Monastery: (5)
    As of 2014-12-21 01:30 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?

      Is guessing a good strategy for surviving in the IT business?





      Results (100 votes), past polls