 Perl-Sensitive Sunglasses PerlMonks

### Are Perl patterns universal?

by sleepingsquirrel (Hermit)
 on Nov 09, 2004 at 01:27 UTC Need Help??

--or--

Why compute with anything else?

Lately I've been wondering about just how powerful Perl's pattern matching language is (commonly known as regular expressions). Because of backtracking and recursion, Perl's patterns can match more than regular languages. They can match (at least some) context-free grammars...
```#match context free grammar --[0{N}1{N}] (e.g. 01, 0011, 000111, etc.)
\$cf = qr/01|0(??{\$cf})1/;

\$init = "00001111";
print "match\n" if (\$init=~m/^\$cf\$/);
...they can also match (at least some) context sensitive grammars...
```#match context sensitive grammar --[0{N}1{N}2{N}] (012, 001122, 000111
+222, ...)
\$h=qr/01|0(??{\$h})1/;
\$t=qr/12|1(??{\$t})2/;
\$cs=qr/(?=\$h 2+\$)(?=0+\$t\$)/x;

\$init = "001122";
print "match\n" if (\$init=~m/^\$cs\$/);
(I've haven't proven those two snippets correct, so any counter examples would be appreciated)

We can also craft patterns that loop forever (not merely a long time) (well, ignoring stack overflows that is)

```#Word boundries are zero-width so the regex engine will never progress

my \$inf; \$inf=qr/\b(??{\$inf})/;
On a more mundane level we can also compute simple things...
```#simple computations with pattern matching operating on unary numbers
\$"="";

\$_ = 'lll+ll'; #3+2

\$_ = 'lll*ll'; #3*2
@mult = m/l(?=.*\*(l*))/g;
print "\$_ = @mult\n";

\$_ = 'lll';
@sqr = m/(l)(?=(l*))(?=(l*))/g;
print "\$_ squared = @sqr\n";

It is also known that certain cellular automata are universal (Rule 110), and with some additional machinery, we can get a regex to emulate one.
```#implement rule 110 cellular automata
\$init = "  X  ";  #initial value (pad with two spaces on either side)
\$"="";

for (1..24)
{
print ' ' x (38-\$_)."\$init\n";
\$init=" @{[(\$init=~m/^(\s)             | #Start of line
(?<=\s)(\s)(?=\s)     | #000
(?<=\s) \s (?=(X))    | #001
(?<=\s) (X)(?=\s)     | #010
(?<=\s) (X) (?=X)     | #011
(?<=X) (\s)(?=\s)     | #100
(?<=X)  \s (?=(X))    | #101
(?<=X)  (X)(?=\s)     | #110
(?<=X)   X  (?=X+(\s))| #111
(\s)\$                   #End of line
/gx)]} ";
}
Of course that might be considered cheating, because we've used an external mechanism for looping. It gets more interesting when we consider that some Diophantine equations are universal . And a few perl wizards have managed to use regex to solve linear Diophantine equations and test for primeness (see also section 6.16 of the Ram book).

Your mission, should you choose to accept it, is to meditate on this problem and see if you can enlighten the rest of us as to whether perl5 patterns are universal. (Or, as a lesser challenge, see if you can get a regex to compute 2**N, or factorial(N), or Ackermann's function)

-- All code is 100% tested and functional unless otherwise noted.

Replies are listed 'Best First'.
Re: Are Perl patterns universal?
by blokhead (Monsignor) on Nov 09, 2004 at 06:41 UTC
The question is trivial if you allow (?{code}) and (??{code}) constructs, as you can encapsulate arbitrary Perl in the regex.

Do you mean, given an arbitrary Turing machine, can you build a regex \$foo such that \$x =~ /\$foo/ if and only if \$x is accepted by the Turing machine? Using just lookaheads/lookbehinds and captures/backreferences, I'd definitely say no. You don't even get all CFLs this way (consider {a^n b^n : n > 0}) but yet you get some CSLs (consider {a^n b a^n b a^n : n > 0}). See also Perl regular expressions vs. RL, CFL, CSL.

A Turing machine must have unbounded memory. A Perl regex using only captures & backreferences has a bounded (linear in the size of input) amount of "memory" about the its input -- the number of captures is fixed when the regular expression is compiled, and each capture contains at most the entire string. Not only that, but the type of access to this memory is very stricly limited: you can only match substrings. In addition, you don't have write access to this memory in any sort of arbitrary fashion (apart from trying a different substring of the input -- which is hardly arbitrary, and could be viewed as just a form of nondeterminism). This is an essential feature for the power of a Turing machine.

Even allowing (??{\$re}) just gets you all CFLs (and various closures -- intersection and complement, etc), but doesn't allow for universal behavior because the unbounded memory just isn't there.

To do most "interesting" things with regexes, you'll need a layer of Perl around (or inside) the regex, to do either iterated substitutions, multiple matches, or build a different regex for each input. The last approach is a fun one used by various regex-reductions: Hamiltonian Cycle, 3SAT, N-Queens.

A Turing machine must have unbounded memory.
Yeah. But I'm encouraged because of the possibility of infinite recursion (from above).
It is not obvious to me how to do arbitrary memory access in any of the following, but they're all universal.

-- All code is 100% tested and functional unless otherwise noted.
infinite recursion
Recursion with (??{\$re}) is only sufficient for context-free matching. Even primitive recursion requires some sort of argument passing. Pretty much the only thing you can "pass" is the current pos of the string, which is way too restricted: You have only a fixed number of values for pos, and a fixed number of regexes you could be "recursing" to, so you can answer the halting problem for these creatures (see footnote below).
cellular automata
Here the grid of automata must be unbounded. Otherwise, you only have a finite number of possible grid configurations (number of automata states ^ size of grid).
Diophantine equations
The number of variables in the equation is fixed, but their values can be arbitrarily large integers. If their values are bounded, then you only have a finite number of combinations to try (possible values ^ number of variables), and you can always halt while determining if a Diophantine equation has a solution.
cyclical tag systems
This is just like an automaton with a queue -- take from the front and add to the back. But you must allow for rules which increase the size of data in the queue, which can happen indefinitely. If you are not allowed to increase the size of the queue data (or if you have an upper limit on the queue size), you only have a finite number of queue contents and thus configurations of the automaton (number of states * (queue alphabet size ^ max queue size))
SK combinators
I don't pretend to have any special insight on SK combinators. But what you have is a very restricted projection operator K, and a very restricted recursion operator S which still encompasses primitive recursion and μ recursion. The μ recursion is the key part of the universality of general recursive functions, as the value being minimized may grow arbitrarily large.

Footnote: when a system has a finite number (say, N) of possible configurations on a given input, you can answer the halting problem for it as follows (where "halting" means entering some special subset of configurations): Simulate it for N steps. If it hasn't reached a halting configuration by then, it must repeat a configuration. Since the next configuration depends only on the previous configuration, it must be in an infinite loop and thus will never reach a halting configuration. Turing machines have an infinite tape and thus an infinite number of possible configurations.

Clearly if you can answer halting queries on a system, it is not universal (Halting Problem).

Re: Are Perl patterns universal?
by pg (Canon) on Nov 09, 2004 at 02:50 UTC
```"\$_ = 'lll+ll'; #3+2

\$_ = 'lll*ll'; #3*2
@mult = m/l(?=.*\*(l*))/g;
print "\$_ = @mult\n";

\$_ = 'lll';
@sqr = m/(l)(?=(l*))(?=(l*))/g;
print "\$_ squared = @sqr\n";"

Alright ;-) In this case, I would say Perl's string operation can also be used for simple computation: (indeed this is the way how human being started to learn numbers and math at the begining, and when we were little kids, each of us repeated this process)

```my \$a = '111';
my \$b = '11';

print \$a . \$b, "\n"; # 3 + 2
print \$a x length(\$b), "\n"; #3 * 2
print \$a x length(\$a), "\n"; #sqr(3)
Re: Are Perl patterns universal?
by ysth (Canon) on Nov 09, 2004 at 03:11 UTC

We can also craft patterns that loop forever (not merely a long time) (well, ignoring stack overflows that is)

```#Word boundries are zero-width so the regex engine will never progress

my \$inf; \$inf=qr/\b(??{\$inf})/;
Did you try it? See perlre/Repeated patterns matching zero-length substring

Currently, you can mis-use \G to loop forever: 1 while "a" =~ /a|\G/g. Doesn't work in a single list-context ()="a"=~/a|\G/g though.

Did you try it?
running this program...
```my \$inf; \$inf=qr/\b(??{\$inf})/;
print "matched" if ("abc"=~m/\$inf/);
...under perl5.8.4 results in a "Segmentation fault" on my machine.

-- All code is 100% tested and functional unless otherwise noted.

Create A New User
Node Status?
node history
Node Type: perlmeditation [id://406253]
Approved by atcroft
Front-paged by Old_Gray_Bear
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others musing on the Monastery: (5)
As of 2020-01-23 14:22 GMT
Sections?
Information?
Find Nodes?
Leftovers?
Voting Booth?
The worst excuse I have ever heard is:

Results (109 votes). Check out past polls.

Notices?