Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

Perl is not Dynamically Parseable

by Jeffrey Kegler (Friar)
on Oct 12, 2009 at 04:11 UTC ( #800599=perlmeditation: print w/ replies, xml ) Need Help??

As already suggested in perlmonks, the proof that Perl parsing is not, in general, decidable applies equally to all parsing by the Perl interpreter or by Perl scripts. This point had already been made in my series on "Perl and Undecidability" in The Perl Review. [ This is now available online ]. Despite this, some confusion remains on whether, in general, Perl might be "dynamically" parseable.

The Perl Parsing Undecidability proof shows that a Turing machine cannot, in general, parse Perl. Whether running a script, or parsing, the Perl interpreter, and any program implementing the Perl language is, as far as Perl parsing goes, a Turing machine.

(A Turing machine is the best and most standard theoretical model for all general-purpose languages and machines. Technically, Perl features like access to the system clock are not in the Turing machine model, but it is simpler, and entirely safe, to ignore these non-Turing Perl features. The system clock isn't going to help you parse Perl.)

This means that how you define "static" and "dynamic" Perl parsing does not matter. All Perl processing is at best equivalent to a Turing machine. And therefore none of it suffices to parse Perl in general.

The confusion does have a reasonable basis in some common sense observations. In practice, Perl almost always does parse itself just fine, despite any difficulties people may have writing tools to analyze Perl scripts. And there is a very convincing (though fallacious) counter-argument. In what follows I deal with one.

Fallacious Counter-Argument: First, let's reduce the problem of Perl parsing to the issue of determining whether a function's prototype is nullary or not. If it's proved that determining nullarity is, in general, decidable, it reduces my original proof to rubble. It probably also generalizes to a full counter-proof.

Consider the following code snippet:

sub runtime_nullary { my $function = shift; return 0 if not defined( my $prototype = prototype $function ); return $prototype eq q{}; } ## end sub runtime_nullary print 'nullary dunno: '; say runtime_nullary('dunno') ? 'yes' : 'no'; print 'result is '; say dunno +4;
First take some Perl code. It must be free of fatal errors, but otherwise can be "arbitrary". ("Arbitrary" in math-speak means "anything you like".) Now let's make 4 observations about it.
  1. Something like the above snippet can placed to execute first (or early) in the run phase of the arbitrary code.
  2. We can place this snippet in the arbitrary code so that we always reach the snippet and execute it.
  3. Once runtime_nullary runs, we know whether the function named in its argument is nullary or not.
  4. We can run runtime_nullary on any function we like, and on as many functions as necessary.

So therefore, Fallacious Conclusion: We can determine, for arbitrary Perl code, whether any Perl function has a nullary prototype or not.

First, let me concede that the Fallacious Conclusion does follow from Observations 1-4. Let me also concede that Observations 1, 3 and 4 are correct. And let me also concede that if Fallacious Conclusion is correct, my Perl Parsing Undecidability Proof (and probably any other) is kaput.

But what about Observation 2? It looks solid, but it hides a major fallacy. We reach the run phase only if we get out of the compile phase. True, this is something we usually take for granted, but if we assume the compile phase eventually ends (or that we can tell whether it will or not), we assume there is no Halting Problem in the compile phase. And if we assume that there is no Halting Problem in Perl's compile phase, we assume that the compile phase does not have Turing machine power.

Perl's compile phase does have Turing machine power, and therefore is subject to the Halting Problem. This is not just a fact, but it is exactly that property of Perl which causes Perl parsing to not, in general, be decidable. The ghost of the Halting Problem must always haunt those who seek the power of the One Universal Machine to Bind Them All.

Update, 11 Oct 2009: I simplified the code snippet a bit to focus on my point in this post. The original is on page 25 of the Fall 2008 issue of The Perl Review.

Update, 12 Oct 2009: Based on feedback from blokhead and ikegami, revised the Fallacious Counter-Proof to emphasize that the problem lies in determining nullarity for arbitrary code.

Comment on Perl is not Dynamically Parseable
Download Code
Re: Perl is not Dynamically Parseable
by chromatic (Archbishop) on Oct 12, 2009 at 04:26 UTC

    Accurate and true, however if I catch you writing code like that for any purpose other than this proof, I will take your keyboard away.

    Practically speaking, this is not a problem in Perl programs reasonably written with maintainability in mind.

      The code examples for the TPR undecidability series were intended as thought problems. Typically they were meant to fail in thought-provoking ways, ways that were usually unexpected. Truly not the kind of code you'd like to maintain.

      In one of my TPR drafts, I said that, if you read my code without understanding its objective, you'd think the proof was that I have no idea how to write code. I think that remark wound up on the cutting room floor.

        Most people reading this site regularly understand that. Yet even I find it surprising how many other people skimmed your article and came away with a very, very different misunderstanding of the implications in practice.

Re: Perl is not Dynamically Parseable
by ikegami (Pope) on Oct 12, 2009 at 05:15 UTC

    Fallacious Conclusion: We can in general determine whether any Perl function has a nullary prototype or not.

    It's not fallacious. There's no guarantee that the answer won't change later or that it'll be the same at the same point of execution in a different interpreter, but we can indeed determine whether any Perl function has a nullary prototype or not.

      It's fallacious because you need to know you will reach the code which can make the determination. And to know, in general, that you will reach that nullarity-determining code you've got to know the answer to the Halting Problem in every form it can take. Being able to determine the nullarity of a Perl function, given arbitrary Perl code, amounts to being able to fully predict the operation of a Turing machine.

        It's fallacious because you need to know you will reach the code which can make the determination

        That makes no sense. No matter what code is being compiled or executed, it can be determined whether any Perl function has a nullary prototype or not. No piece of Perl code needs to be reached.

Re: Perl is not Dynamically Parseable
by blokhead (Monsignor) on Oct 12, 2009 at 15:41 UTC
    Perhaps a clearer distinction is the following:
    It is easy to determine whether a defined sub has a nullary prototype.
    Indeed, your runtime_nullary sub determines just that. And "defined" here is important, as we will see below. On the other hand,
    It is undecidable to determine whether an arbitrary computation (say, the BEGIN phase of a perl script) results in defining a sub with nullary prototype.
    This is demonstrated by your previous undecidability posts. I agree with ikegami in terms of how things are worded. Your "fallacious conclusion" is not fallacious. Its wording makes it sound like the hard part is: given a code ref, determine its prototype. But you can always determine the prototype of a Perl sub. What you can't do is determine whether an arbitrary piece of code will result in defining a nullary prototyped sub. Update: it is this "given arbitrary code" part that I think is not formally exposited in your post.

    The thing about "reaching" the determining code is a red herring. It's just part of one (bad?) method of trying to determine the outcome of an arbitrary computation -- by running the computation and then (when it is done) looking for its result. Of course we know that can't work, but it's not saying much more than "you can't wait for a computation to finish, when you're not sure whether the computation will finish".

    You could say that this is directly analogous to the halting problem itself. It is easy to determine whether a Turing Machine that has already halted has halted with 0 or 1 on its output tape -- just look at the tape! However, it is undecidable to determine, given a Turing machine, whether it will eventually halt with 0 or 1.

    In our context, the sub's prototype is just a place to store the outcome of an arbitrary Turing Machine computation. The problem of reading or interpreting the computation's outcome (determining the sub's prototype) is not undecidable, it's the problem of determining what that outcome of an arbitrary computation will be.

    I hope this helps making the distinction clearer & more precise.

    blokhead

      Thanks for the feedback. In response, I've revised the original post in a way that I hope improves it.

      An alternative approach would be to simply state the result and that it is a direct and almost trivial consequence of Rice's Theorem. This result is indeed so trivial mathematically that it is unpublishable, but in a math journal, that's how it would be presented. The article would be very short.

      Because of the significance of these results, I've labored to present them to readers who are Perl-literate, but not fluent in Theory of Computation. The risk is always that a translation from math-speak will seem sloppy to those who understand it and unconvincing to those who don't.

        Concerning your update:

        So therefore, Fallacious Conclusion: We can determine, for arbitrary Perl code, whether any Perl function has a nullary prototype or not.

        It's not clear what that means. Are you talking about a function declaration in the source, a function instance produced by a given parser instance, or the function instances produced by every parser instance.

        First, let me concede that the Fallacious Conclusion does follow from Observations 1-4.

        You can't concede the validity of an argument. You are making an assertion.

        And your assertion is false. The conclusion doesn't follow from the premises (anymore?).

        Perl's compile phase does have Turing machine power,

        You can't use that premise. That's the conclusion of the argument you're trying to verify.

Re: Perl is not Dynamically Parseable
by ikegami (Pope) on Oct 12, 2009 at 21:37 UTC

    The approach I would have taken:


    The hypothesis is

    Perl cannot be parsed

    which is to say

    There exists a program for which two instance of the Perl parser produce different output

    which is true if

    There exists a function call in a program for which two instance of the Perl parser have different prototypes for the named function.

    This is actually very simple to prove by example.

    However, you tried to prove this by counter-proof. That means you need to disprove the counter-hypothesis

    For every function call in every program, every instance of the Perl parser will have the same prototype for the named function.

    No problem there either. This can be disprove with the same example.

      However, you tried to prove this by counter-proof.

      Actually, no. The whole presentation of the counter-argument only attempts to prove that that line of counter-argument (which has been put forward in a few variations) fails. It doesn't attempt to prove Perl Unparseability. If that was your expectation I can certainly see why you saw it as falling short. My presentation of the counter-argument is like an exposition of a variation on the Sicilian Defense intending to show a chess maven why he should never play it.

      Arguably the post might have been stronger without it. I've dumped it into a "readmore".

      The proof is given in three forms in the TPR articles. The strongest form is the one that uses Rice's Theorem. The other two may be considered explanations intended to help those trying to understand why Perl Parsing Undecidability is provable.

      I'd very much look forward to seeing a full write-up of this proof. I suspect that many will prefer it to any of mine. In fact, one of the people who prefer your proof might be me.

      It is very common in math to publish new proofs of old results. In any field, the first statement of something is seldom either the clearest or the most elegant.

        Let's find a program where the prototype is non-deterministic and affects context.

        BEGIN { *call_to_inspect = ( rand() < 0.5 ? sub ($) {} : sub (@) {} ); }

        You can detect some changes in the compile-time state of a prototype by its effect on context.

        BEGIN { *call_to_query = ( rand() < 0.5 ? sub ($) {} : sub (@) {} ); } sub inspector { print wantarray ? 1 : 0, "\n" } call_to_inspect(inspector());

        Let's observe:

        >perl a.pl 1 >perl a.pl 1 >perl a.pl 0 >perl a.pl 1

        Therefore, there exists a function call in a program for which two instance of the Perl parser have different prototypes for the named function.

        Therefore, there exists a program for which two instance of the Perl parser produce different output.

        Or for short, Perl cannot be parsed.

      I'm not sure what you mean here by "two instances of the Perl parser". I'm pretty sure I know what you're getting at, but I wouldn't call it two instances of the parser. It is more like two possible syntactic interpretations/parses of the same piece of code, depending on the prototype of a previously defined sub.

      Saying "two instances of the parser" produce different things (on the same given code, presumably) makes it sound simply like the parser is non-deterministic or randomized.

      blokhead

        It is more like two possible syntactic interpretations/parses of the same piece of code,

        And what produces those two possible parses? Two parser instances. We're saying the same thing.

        Saying "two instances of the parser" produce different things (on the same given code, presumably) makes it sound simply like the parser is non-deterministic or randomized.

        Good.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlmeditation [id://800599]
Approved by GrandFather
Front-paged by Arunbear
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others having an uproarious good time at the Monastery: (7)
As of 2014-12-20 19:44 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (97 votes), past polls