fergal has asked for the wisdom of the Perl Monks concerning the following question:

Just curious if anyone knows what the runtime complexity for Perl6's grammar engine will be. If it's a souped up verison Parse:RecDescent does that mean it's going to be worse than O(l) (l being length of string to parse). I think rec descent without memoization is O(l^3).

That might cause trouble for long source files.

Replies are listed 'Best First'.
Re: Perl 6 rules and complexity
by TimToady (Parson) on Feb 17, 2006 at 02:02 UTC
    Well, I think O() notation breaks down for recursive descent, especially in the absence of information about the grammar and the kind of input expected. Certainly it can't do better than O(I), but it doesn't have to do a lot worse unless you're planning to do a lot of backtracking. As with Perl 5 regex behavior, you're going to do much better with patterns that succeed than with patterns that fail. I suspect memoization is only going to help you if you're in the situation of failing frequently. After all, if you're succeeding, you're changing which token you're looking at, and that will by definition blow up any memoization cache because you're changing your inputs.

    For recursive descent without backtracking, you're going to pay in only two dimensions, one of which is the length of the input, but the other of which is the depth of your grammar, modulated by how often your input forces you to traverse that depth. Your typical computer language forces you down through all the levels of precedence on every term.

    Now, as it happens, Perl 6's grammar is a fairly deep grammar in terms of precedence levels, so that would be a big hit for a pure recursive descent approach. Fortunately, as chromatic pointed out, we're not really purists. So our approach is to try to have our cake and eat it too. We'll use recursive descent as the default semantics, but allow various declarative ways of modifying that. In particular, for parsing Perl 6 itself, we'll intermix an operator precedence grammar for expression parsing, and just use recursive descent for large scale structures as well as individual tokens that contain expressions. That should give us about a 20-times speedup over pure recursive descent, give or take a few orders of magnitude.

    This approach also allows us to insert new precedence levels on the fly if the user requests them. When a user adds a macro or an operator, they might not even realize they're changing the Perl grammar on the fly.

    But even leaving macros and such out of it, we're also trying to keep the whole ordinary rule syntax declarative enough that you could use a pragma at the top of your grammar to install some other kind of grammar engine, LALR or whatever. As long as the effects are limited to the current lexical scope, that's fine. "All is fair if you predeclare."

    But the basic approach remains recursive descent in flavor, because for all its faults, recursive descent can give you very good error messages. Another hybrid notion that we'll likely be exploring is to go ahead and do a fast bottom-up parse in spots, but when something goes haywire syntactically, go ahead and backtrack a little to try a recursive descent parse, not to try and second-guess the bottom-up parse and "fix up" the user's code, but rather to try and determine where the user went wrong, and give some guidance. Epistemologically speaking, a recursive descent parse can keep track of the dubious decision points of the user and see if a different hypothetical decision would have resulted in a successful parse. If so, we don't actually do any code generation--the code is wrong, after all--but just whack the user upside the head with a more clueful cluebat than we could have with a bottom-up parse. The problem with most bottom-up parsers is they can't tell you anything much more specific than "Something's wrong at line 42."

      Yeah, it depends on the grammar but worst case is exponential (according to this guy anyway). If it's memoizable it's guaranteed to be linear in the length of the string being parsed (worst case is @rules * length($string) ). This only includes a very limited type of backtracking, once you allow more powerful regex-like backtracking on quantified rules it becomes very easy to lose linearity even with memoization.

      It would be unfortunate if people could write programs that didn't either compile or fail in a reasonable time or if people took 2 pieces of code that independently compiled quickly but when appended compiled much more slowly.

      The ability to cause arbitrary side-effects in rule actions makes that far easier to do or possibly even guarantees it for anything but the simplest grammars (for example any grammar that is an extension of Perl6's).

        To put that in context - it is known that type inference in languages like ML is DEXP Time complete (http://portal.acm.org/citation.cfm?coll=GUIDE&dl=GUIDE&id=96748, my Masters Thesis was a modified proof of that :))
Re: Perl 6 rules and complexity
by chromatic (Archbishop) on Feb 16, 2006 at 21:17 UTC

    It's not clear exactly what it will be, but it won't be completely recursive descent. Hopefully it will be as little recursive descent as possible.

    Looking at PGE is probably the best way to gauge the algorithmic behavior of the grammar.

      Judging by Apocalypse 5 it won't be possible to execute arbitrary code in rule actions. At least I can't find any examples of anything besides manipulating "hypothetical" variables, so maybe it will be memoizable.

      Unfortunately after a quick look, I don't fancy digging through PIR, especially as that seems to be in very early stages and probably bearing no resemblance to it's final state.

        Any arbitrary code is allowed in rule actions, just as you can put any arbitrary code into a yacc grammar's reductions. Perl likes to optimize for power over purity unless you specify otherwise somehow.