Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight

comment on

( #3333=superdoc: print w/replies, xml ) Need Help??
You suggest that the perl re-engine is as powerful as a pushdown automata. This assumes that we start breaking into use re 'eval' territory with (?{}) and (??{}). Then we are not only PDA but also turing complete. If we use the possibility to include perl code it is trivially turing complete, just use a TURING_START tag and then run the turing machine in the perl code. Thus for the purpose of this little exercise we will not allow ourself to evaluate anything using (?{}). Only recursive regexs in ??{} blocks. First of all, if we allow ourself the /g modifier combined with while we could always do:
my @strings = qw(aaabbbccc abc aaaaabbbbbccccc abbccc); print "\nAttempt 1\n"; foreach my $str ( @strings ) { print "$str: "; $_ = $str; while (s/^a(a*)b(b*)c(c*)$/$1$2$3/g) {}; print ($_ ? "Rejected" : "Accepted","\n"); }
It should be noted that this parses the grammar {a^nb^nc^n | n >= 1) which is beyond the capability of a CFG. Note that if we allow ourselfs to move into "perlspace" we can just say:
print "\nAttempt 2\n"; my $count; my $re2 = qr/^ (?{ $count = 0}) (a (?{ $count++ }))* (??{"b{".$count."}"}) (??{"c{".$count."}"}) /x; foreach my $str ( @strings ) { print "$str: "; print ($str =~ m/$re2/ ? "Accepted" : "Rejected","\n"); }
Altough this says nothing, since perl is turing complete (since Acme::Ook exists this is known through the proof that brainfuck is :)) But what if we don't want to allow ourself this? This would still use ??{}. Assuming we allow only a simple recursive regex and not a full perl statement, once again to avoid moving the parser into "perl-space". Let make an attempt:

1. A pushdown automata equivalence

In this case we should settle for trying to establish an equivalence with a pushdown automata. To do this it would be enough to establish ourselfs as equivalents of a CFG. This means that we should be able to parse any Context-free grammar G. A formal definition(1):
 G = (V,T,R,S)  where: (T is \Gamma).
    V - An alphabet (finite set)
    T - Terminals (subset of V)
    R - Rules a subset of (V-T)xV* and
    S - Startsymbol, an element of (V-T)

2. Encoding of a general PDA into a perl regex.

2.1 Assumptions

Assume w.l.o.g that V = \w+.

2.2 Terminals

2.2.1 Creating symbols for terminals

Terminals T are actual 'strings' and can be encoded as trivial regexps. They should be named $alpha_T so that for instance the terminal 'a' becomes $alpha_a = qr/a/ and 'b' becomes $alpha_b = qr/b/. $alpha_T = qr/ T /x;

2.3 Non-terminals

2.3.1 Rules

Each rule r_i should be viewed as the tuple (N,L), where L \subset V*. To create the rule regex $rule_n you should juxtapose the letters in L like this:
$rule_i = qr/ (??{$alpha_v1}) (??{$alpha_v2}) ... (??{$alpha_vn}) /x;

2.3.2 Creating symbols for non-terminals

This is the set (V-T) and they are represented by the rules R. For each non-terminal N you should connect all rules (r1..rn) where the first element is N and then construct the alternating rule:
$alpha_N = qr/ (??{$rule_1}) | (??{$rule_2}) | ... | (??{$rule_n}) /x +;

2.4 Start symbol

On of the non-terminals should be named the start-symbol and get special encoding :
$START = $alpha_N;
making the start symbol an alias for it.

2.5 The final regexp

The regexp G accepting the language is simply $G = qr/ ^ (??{$START}) $ /x;

3 An example language

From (pg 116 in (1)):
Note here the implied \s* after each terminal in T due to a convention in English, you put spaces between words.
 W = {S,A,N,V,P) \union T
 T = {Jim, big, green, cheese, ate}
 R = { P -> N, P -> AP, S -> PVP,     (Rules 1-3)
       A -> big, A->green,            (Rules 4-5) 
       N -> cheese, N-> jim, V-> ate} (Rules 6-8)
Encodes into:

# Terminals

$alpha_Jim = qr/ Jim \s* /x; $alpha_big = qr/ big \s* /x; $alpha_green = qr/ green \s* /x; $alpha_cheese = qr/ cheese \s* /x; $alpha_ate = qr/ ate \s* /x;

# Rules

$rule_1 = qr/ (??{$alpha_N}) /x; $rule_2 = qr/ (??{$alpha_A}) (??{$alpha_P})/x; $rule_3 = qr/ (??{$alpha_P}) (??{$alpha_V}) (??{$alpha_P})/x; $rule_4 = qr/ (??{$alpha_big}) /x; $rule_5 = qr/ (??{$alpha_green}) /x; $rule_6 = qr/ (??{$alpha_cheese}) /x; $rule_7 = qr/ (??{$alpha_Jim}) /x; $rule_8 = qr/ (??{$alpha_ate}) /x;

# Non-terminals

$alpha_P = qr/ (??{$rule_1}) | (??{$rule_2}) /x; $alpha_S = qr/ (??{$rule_3}) /x; $alpha_A = qr/ (??{$rule_4}) | (??{$rule_5}) /x; $alpha_N = qr/ (??{$rule_6}) | (??{$rule_7}) /x; $alpha_V = qr/ (??{$rule_8}) /x;

# Start and the language G defined

$START = $alpha_S; $G = qr/ ^ (??{$START}) $ /x;


@strings = ('Jim ate cheese','big Jim ate green cheese', 'big cheese ate Jim', 'big cheese ate green green big green big cheese', 'ate cheese','Jim ate ate', 'Jim ate big big big big chees +e'); foreach (@strings) { print "Try: $_ -- ", /$G/ ? "Accepted" : "Rejected", "\n"; }


This is not a complete proof. I think the mapping is complete but I do not have the time at the moment to prove this. The mapping is also fairly "trivial", but maybe someone will be amused. If nothing else at how much spare time I seem to be having.

(1) Elements of the theory of computation, Lewis and Papadimitriou

In reply to Re: Re: Perl and Context Free Grammar by dref
in thread Perl and Context Free Grammar by timothy

Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":

  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?

    What's my password?
    Create A New User
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others exploiting the Monastery: (4)
    As of 2021-04-12 04:18 GMT
    Find Nodes?
      Voting Booth?

      No recent polls found