Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

Re: Re: Perl and Context Free Grammar

by dref (Novice)
on Nov 20, 2003 at 21:05 UTC ( #308707=note: print w/ replies, xml ) Need Help??


in reply to Re: Perl and Context Free Grammar
in thread Perl and Context Free Grammar

You suggest that the perl re-engine is as powerful as a pushdown automata. This assumes that we start breaking into use re 'eval' territory with (?{}) and (??{}). Then we are not only PDA but also turing complete. If we use the possibility to include perl code it is trivially turing complete, just use a TURING_START tag and then run the turing machine in the perl code. Thus for the purpose of this little exercise we will not allow ourself to evaluate anything using (?{}). Only recursive regexs in ??{} blocks. First of all, if we allow ourself the /g modifier combined with while we could always do:

my @strings = qw(aaabbbccc abc aaaaabbbbbccccc abbccc); print "\nAttempt 1\n"; foreach my $str ( @strings ) { print "$str: "; $_ = $str; while (s/^a(a*)b(b*)c(c*)$/$1$2$3/g) {}; print ($_ ? "Rejected" : "Accepted","\n"); }
It should be noted that this parses the grammar {a^nb^nc^n | n >= 1) which is beyond the capability of a CFG. Note that if we allow ourselfs to move into "perlspace" we can just say:
print "\nAttempt 2\n"; my $count; my $re2 = qr/^ (?{ $count = 0}) (a (?{ $count++ }))* (??{"b{".$count."}"}) (??{"c{".$count."}"}) /x; foreach my $str ( @strings ) { print "$str: "; print ($str =~ m/$re2/ ? "Accepted" : "Rejected","\n"); }
Altough this says nothing, since perl is turing complete (since Acme::Ook exists this is known through the proof that brainfuck is :)) But what if we don't want to allow ourself this? This would still use ??{}. Assuming we allow only a simple recursive regex and not a full perl statement, once again to avoid moving the parser into "perl-space". Let make an attempt:

1. A pushdown automata equivalence

In this case we should settle for trying to establish an equivalence with a pushdown automata. To do this it would be enough to establish ourselfs as equivalents of a CFG. This means that we should be able to parse any Context-free grammar G. A formal definition(1):
 G = (V,T,R,S)  where: (T is \Gamma).
    V - An alphabet (finite set)
    T - Terminals (subset of V)
    R - Rules a subset of (V-T)xV* and
    S - Startsymbol, an element of (V-T)

2. Encoding of a general PDA into a perl regex.

2.1 Assumptions

Assume w.l.o.g that V = \w+.

2.2 Terminals

2.2.1 Creating symbols for terminals

Terminals T are actual 'strings' and can be encoded as trivial regexps. They should be named $alpha_T so that for instance the terminal 'a' becomes $alpha_a = qr/a/ and 'b' becomes $alpha_b = qr/b/. $alpha_T = qr/ T /x;

2.3 Non-terminals

2.3.1 Rules

Each rule r_i should be viewed as the tuple (N,L), where L \subset V*. To create the rule regex $rule_n you should juxtapose the letters v1..vn in L like this:
$rule_i = qr/ (??{$alpha_v1}) (??{$alpha_v2}) ... (??{$alpha_vn}) /x;

2.3.2 Creating symbols for non-terminals

This is the set (V-T) and they are represented by the rules R. For each non-terminal N you should connect all rules (r1..rn) where the first element is N and then construct the alternating rule:
$alpha_N = qr/ (??{$rule_1}) | (??{$rule_2}) | ... | (??{$rule_n}) /x +;

2.4 Start symbol

On of the non-terminals should be named the start-symbol and get special encoding :
$START = $alpha_N;
making the start symbol an alias for it.

2.5 The final regexp

The regexp G accepting the language is simply $G = qr/ ^ (??{$START}) $ /x;

3 An example language

From (pg 116 in (1)):
Note here the implied \s* after each terminal in T due to a convention in English, you put spaces between words.
 W = {S,A,N,V,P) \union T
 T = {Jim, big, green, cheese, ate}
 R = { P -> N, P -> AP, S -> PVP,     (Rules 1-3)
       A -> big, A->green,            (Rules 4-5) 
       N -> cheese, N-> jim, V-> ate} (Rules 6-8)
Encodes into:

# Terminals

$alpha_Jim = qr/ Jim \s* /x; $alpha_big = qr/ big \s* /x; $alpha_green = qr/ green \s* /x; $alpha_cheese = qr/ cheese \s* /x; $alpha_ate = qr/ ate \s* /x;

# Rules

$rule_1 = qr/ (??{$alpha_N}) /x; $rule_2 = qr/ (??{$alpha_A}) (??{$alpha_P})/x; $rule_3 = qr/ (??{$alpha_P}) (??{$alpha_V}) (??{$alpha_P})/x; $rule_4 = qr/ (??{$alpha_big}) /x; $rule_5 = qr/ (??{$alpha_green}) /x; $rule_6 = qr/ (??{$alpha_cheese}) /x; $rule_7 = qr/ (??{$alpha_Jim}) /x; $rule_8 = qr/ (??{$alpha_ate}) /x;

# Non-terminals

$alpha_P = qr/ (??{$rule_1}) | (??{$rule_2}) /x; $alpha_S = qr/ (??{$rule_3}) /x; $alpha_A = qr/ (??{$rule_4}) | (??{$rule_5}) /x; $alpha_N = qr/ (??{$rule_6}) | (??{$rule_7}) /x; $alpha_V = qr/ (??{$rule_8}) /x;

# Start and the language G defined

$START = $alpha_S; $G = qr/ ^ (??{$START}) $ /x;

#Tests

@strings = ('Jim ate cheese','big Jim ate green cheese', 'big cheese ate Jim', 'big cheese ate green green big green big cheese', 'ate cheese','Jim ate ate', 'Jim ate big big big big chees +e'); foreach (@strings) { print "Try: $_ -- ", /$G/ ? "Accepted" : "Rejected", "\n"; }

Conclusion

This is not a complete proof. I think the mapping is complete but I do not have the time at the moment to prove this. The mapping is also fairly "trivial", but maybe someone will be amused. If nothing else at how much spare time I seem to be having.

(1) Elements of the theory of computation, Lewis and Papadimitriou


Comment on Re: Re: Perl and Context Free Grammar
Select or Download Code
Re: Re: Re: Perl and Context Free Grammar
by dref (Novice) on Nov 28, 2003 at 11:47 UTC
    Ah yes, and I was actually mad enough to make a webpage out of this. http://www.dunata.com/science/compsci/perl-re.php. Amusing might be my little program that parses BNF-like syntax and creates a program that accepts strings belonging to that syntax. Oh the things we do to avoid working.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://308707]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (8)
As of 2014-07-14 07:47 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    When choosing user names for websites, I prefer to use:








    Results (256 votes), past polls