Syntactic Confectionery Delight  
PerlMonks 
Turing completeness and regular expressionsby JadeNB (Chaplain) 
on Nov 28, 2009 at 04:32 UTC ( #809842=perlmeditation: print w/ replies, xml )  Need Help?? 
This was originally intended to be a CUFP, but I thought that the story behind the problem was at least as interesting as the solution; and, the more I wrote of the story, the more meditative it seemed. (I think that, even if the specific code doesn't catch your attention much, at least one of the linked topics might!) I hope that this was the right place for it to end up. The question has (implicitly) rung out, on Perlmonks, Proggit, and, one must presume, the haunted nightmares of many: Are Perl regexes Turing complete? This question caught the eye of yours truly, except that I confused the Turing completeness of regexes with the Turing completeness of substitutions, which led to an apparently unmotivated question and which required ikegami to set me straight. Now I think that I've got the question: Can Perl regexes be used to recognise an arbitrary recursively enumerable language? The stirring answer: No, sort of ^{*}. Why not? Well, a regex can be simulated by a linearbounded automaton (that's the hyphenation, and I'll stick by it), and these can recognise only the contextsensitive languages (not all the recursivelyenumerable ones, as a Turing machine can). Another way to say this is that regexes can only recognise languages in NSPACE(n). Thought of in this way, it's not so surprising: A regex has finitely many capturing groups, say k of them, each of which can be stored in space equal to the length, n, of the input (since they will, after all, only contain substrings of the input); so we only need space k*n to perform the ‘superDFA’ part of Perl regex matching ^{**}. I suspect this upper bound is far Well, so how about the question which I understood in the first place: Can Perl substitutions be used to perform any computation that a Turing machine can, viewing the string on which they act as the input tape? Again: No, sort of ^{*}. After all, a substitution consists just of a match—which can be completed by an LBA, as above—followed by the assembly of the substitution string—which, again, involves only constant overhead (the literal, or constant, part of the match) over a constant multiple of the input (the part coming from interpolation of capture variables—any other variables are not affected by the match, and so may as well be regarded as constants). Thus, substitutions can still handle only CSLs. Well, that's a little anticlimactic. So, what is true? Well, recursive Perl substitutions are Turing complete. As I pointed out elsewhere in the linked thread, that's known; but, hey, if I can figure out in a month's work what's obvious to MJD, then I figure that I'm doing all right. :) OK, so, that was a post full of negatives followed by a bold assertion; let's substantiate it. The obvious way to show Turing completeness is to simulate a Turing machine, but, man, that's been done to death. I heard the plea of cdiggins on LtU, who wondered why everyone prefers the λ calculus over the SKI combinatorcalculus, and determined that I would not be among those hip and trendy kids who think only of the former. I quickly came up with: which does a great job of reducing (fully parenthesised) terms from the calculus ^{****}, but is a little unsatisfying because it requires recursive regexes, which I've just said that I don't know aren't already Turing complete. All right, so, moving on. Well, we know that Conway's game of life is Turing complete, so maybe that's a good place to start; but it's played out on a 2dimensional grid, and translating that 2dimensional grid into the 1dimensional ‘tape’ for a regex seemed to be a little hairy. So, a 1dimensional version—well, that sounds like cellular automata. Everybody's favourite new kind of scientist, who studied these extensively, has a favourite—which we know, thanks to Matthew Cook, is Turing complete. So, let's simulate that! The problem is that there are 8 different rules for cell propagation, and running 8 different substitutions seems sort of profligate; can we get it down to 2, as we did for the SKI combinatorcalculus above? If (as I did) we get stuck on that, can we at least show that, if we could do it, we'd have got an optimal result—that is, that a 1regex substitution can never be Turing complete? I spent a few sleepless nights on this; I was sure that it wasn't possible—after all, one of Turing's criteria for a universal computer is that it should be able to make choices; but our subPerl5.10 regexes can't, so we should have to build in at least one choice “manually”, on the level of multiple regexes. How to prove it, though? I thought that I could come up with some argument from Kolmogorov complexity, but no luck—because, it finally occurred to me, I was guilty of the same blinkered thinking as the designers of the early, “fixedprogram” computers, and was trying to shove the entire program into the architecture. Why not instead let the program be recorded in the data? Out of that idea came the following code for implementing rule 110 (or, rather, rule 124, which propagates to the right instead of the left): (The heredoc syntax is fake—at least for me, under bash, perl complains No code specified for e—but I assume that someone out there knows how to fix it.) This accepts input a string of 0s and 1s, which is regarded as infinitely padded with 0s on both sides, and prints out its successive evolutes under rule 124. For example, here's a sample run, showing the evolution of the input 1 (which stands for ...0001000...): (It goes on forever, so don't wait for the surprise finish. :) See the list of projects below!) Emboldened by this success, I thought that perhaps I could tackle the original problem of reduction for the SKI combinatorcalculus. To keep things simple (well, for me), I decided to use the Unlambda syntax, whereby ` is used as the notation for application, so that (SK) becomes `SK. I also wanted to include support for variables; again to keep life simple, I stuck with the austere CSer's set x, x', …. The code is as follows: This accepts strings according to the grammar where t is the start symbol; p is another nonterminal; and S, K, x, ', and ` are terminals. It then reduces them according to the rules of the SKI calculus. Here is a sample run, showing that it correctly reduces the basic combinators in the BCKW calculus (the comments are just that, comments; they are not actually typed): This handles only textual, not functional, equivalence; for example, it correctly reduces ```SKKx to x, but it does not recognise that ``SK``SKK can be replaced by ``SKK. Note, incidentally, that, if we had genuine variablewidth lookbehind (instead of simulating it with \K), then we could get rid of Perl control structures and just hoist this into a single substitution, at the expense of considerably more wasted space; 1 while s/$orig/$repl/ would become s/(?<=0$orig)/0$repl/g, where 0 could be any symbol that does not occur in the input or any of our productions. Some projects:
^{*} What does ‘sort of’ mean? An obvious loophole is that, any time you can embed arbitrary Perl code, you've got as much computational power as Perl—that is to say, Turing completeness. Thus, the (?{ }) and (??{ }) escapes are right out—which is just as well, since they're considered UPDATES: • Sorry, forgot the <readmore>. • Added another project. • Oops, I'd overescaped the perlre links. I think that they should work now (except for the “subPerl5.10” one anchored at (?(condition)yespatternnopattern), which I can't figure out how to handle, because of the ), but please leave a comment or PM me if any of the links are broken.
Back to
Meditations

