http://www.perlmonks.org?node_id=933709


in reply to Re: Writing a Programming Language in Perl
in thread Writing a Programming Language in Perl

Marpa - Parse any Language You Can Describe in BNF

Hopefully, it can do a bit more than that, because associativity cannot be described in BNF.

  • Comment on Re^2: Writing a Programming Language in Perl

Replies are listed 'Best First'.
Re^3: Writing a Programming Language in Perl
by Your Mother (Archbishop) on Oct 25, 2011 at 21:47 UTC

    Definitely not my forte but from the outside it looks wonderful. I'd love to hear your impressions of it if you're inclined and have time to take it though some of its paces.

      In theory, it works in practice.

      At a glance, it seems to me it assumes a simplistic world. The world doesn't conform to BNF. That's why Perl regular expressions aren't actually regular. As I said above, associativity is an example of something BNF can't handle. In BNF, all three of the following are the equivalent:

      foo : foo bar foo # Left- and right-recursive foo : baz foo : foo bar baz # Left-recursive foo : baz foo : baz bar foo # Right-recursive foo : baz

      You might think one could deviate from BNF and declare the second form to be left-associative and the third form to be right-associative.

      However, (most?) parsers can only handle left-recursion or right-recursion, not both. That means one cannot support both the second and the third form, which means we're back to square one.

      You need special tools to handle associativity (like PRD's <leftop> and <rightop>) or more generically, the ability to pass down context (like PRD's subrule args). I have examples of both methods.

      The ability to pass down context is surely useful in other circumstances, too. It can help if you have ambiguities in your grammar. It can probably help with specialised error messages.

        Marpa will handle both left- and right-recursion in linear time. Marpa would accept directly all three of the grammars that you describe. The left- and right-recursive grammars would be parsed in linear time. The one which is both left and right-recursive is ambiguous, and so might be non-linear.

        Marpa parses only context-free grammars, so its core algorithm does not handle context. But neither does PRD's core algorithm, which is LL and considerably less powerful than Marpa's. Both Marpa and PRD use "add-on" features to allow context sensitivity.

        When it comes to extensions like context-sensitivity, PRD has the advantage of being based on recursive descent, which modern programmers find intuitive, and which has a very long track record. Marpa also has features which allow context-sensitivity. For the most general case, you can take the context-free subset of the grammar, produce an ambiguous parse, and then using ranking and/or selection. A more targeted approach could use Marpa's ability to track the progress of the parse token-by-token, and alter the input based on what it sees.

        Marpa's underlying algorithm is highly mathematical and far from intuitive. But this comes with a compensating advantage -- Marpa can and has been converted to C, in which form it is over 20 times as fast. Recursive descent's advantages come from easy customization, making C conversion difficult or impractical.

        So for context-sensitivity, it's a matter of different strategies. My own highly biased opinion is that many applications will find Marpa an advance over recursive descent.