Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery

Re: Perl and Context Free Grammar

by Abigail-II (Bishop)
on Nov 19, 2003 at 13:02 UTC ( #308285=note: print w/replies, xml ) Need Help??

in reply to Perl and Context Free Grammar

Most of the CFG stuff you'll see in relation to Perl is about parsing, not generating. And then, most of the time, CF will not be mentioned. Regular expression can be used to parse CFG's although it's not clear (yet?) whether they are powerful enough to parse all CFGs (I believe they do, but I've not even started proving it yet). And then there's Parse::RecDescent. Also for parsing, but if you use the right "code blocks", you might be able to do some generating stuff as well.


Replies are listed 'Best First'.
Re: Re: Perl and Context Free Grammar
by gjb (Vicar) on Nov 19, 2003 at 13:45 UTC

    Please make a distinction between regular expressions as known in computer science literature and Perl regular expressions. I know someone like you knows, but many people could get very wrong ideas from a statement such as the above.

    Regular expressions in the computer science sense have a subset of the operators Perl regular expressions have, i.e. concatenation, union, kleene star.

    Regular expressions describe regular languages, context free grammars describe context free languages and it is known (and fairly easy to prove) that reguular languages are a proper subset of context free languages. Hence you can't parse context free language with a regular expression unless it happens to be a regular language.

    Perl regular expressions are more powerful than computer science regular expressions since they've features such as capturing and \1, zero-width assertions and code embedding. It is indeed an open problem what the precise expressive power is.

    Sorry for this piece of pedantry, but IMHO it's an important point to make when addressing a very general audience.

    Just my 2 cents, -gjb-

Re: Re: Perl and Context Free Grammar
by Anonymous Monk on Nov 19, 2003 at 14:20 UTC
    A push-down automata <sp?> (or stack machine) is required to read a Context Free Grammer. The Perl "Regular expression engine" is a stack machine due to the way it processes expressions in () and (?...). I would argue (and take the time to flesh out my logic) that the Perl RE Engine is actually a CFG engine.
      It's not able to parse CFG grammers due to it being a stack machine so it can do backreferencing. I believe it can parse CFGs due to its ability to do "delayed regexes" - the (??{ }) construct. 5.004 for instance also uses a stack to parse regexes, but there's no way you can match balanced parens with a 5.004 regex. But you can make a CFG to match balanced parens.


Re: Re: Perl and Context Free Grammar
by dref (Novice) on Nov 20, 2003 at 21:05 UTC
    You suggest that the perl re-engine is as powerful as a pushdown automata. This assumes that we start breaking into use re 'eval' territory with (?{}) and (??{}). Then we are not only PDA but also turing complete. If we use the possibility to include perl code it is trivially turing complete, just use a TURING_START tag and then run the turing machine in the perl code. Thus for the purpose of this little exercise we will not allow ourself to evaluate anything using (?{}). Only recursive regexs in ??{} blocks. First of all, if we allow ourself the /g modifier combined with while we could always do:
    my @strings = qw(aaabbbccc abc aaaaabbbbbccccc abbccc); print "\nAttempt 1\n"; foreach my $str ( @strings ) { print "$str: "; $_ = $str; while (s/^a(a*)b(b*)c(c*)$/$1$2$3/g) {}; print ($_ ? "Rejected" : "Accepted","\n"); }
    It should be noted that this parses the grammar {a^nb^nc^n | n >= 1) which is beyond the capability of a CFG. Note that if we allow ourselfs to move into "perlspace" we can just say:
    print "\nAttempt 2\n"; my $count; my $re2 = qr/^ (?{ $count = 0}) (a (?{ $count++ }))* (??{"b{".$count."}"}) (??{"c{".$count."}"}) /x; foreach my $str ( @strings ) { print "$str: "; print ($str =~ m/$re2/ ? "Accepted" : "Rejected","\n"); }
    Altough this says nothing, since perl is turing complete (since Acme::Ook exists this is known through the proof that brainfuck is :)) But what if we don't want to allow ourself this? This would still use ??{}. Assuming we allow only a simple recursive regex and not a full perl statement, once again to avoid moving the parser into "perl-space". Let make an attempt:

    1. A pushdown automata equivalence

    In this case we should settle for trying to establish an equivalence with a pushdown automata. To do this it would be enough to establish ourselfs as equivalents of a CFG. This means that we should be able to parse any Context-free grammar G. A formal definition(1):
     G = (V,T,R,S)  where: (T is \Gamma).
        V - An alphabet (finite set)
        T - Terminals (subset of V)
        R - Rules a subset of (V-T)xV* and
        S - Startsymbol, an element of (V-T)

    2. Encoding of a general PDA into a perl regex.

    2.1 Assumptions

    Assume w.l.o.g that V = \w+.

    2.2 Terminals

    2.2.1 Creating symbols for terminals

    Terminals T are actual 'strings' and can be encoded as trivial regexps. They should be named $alpha_T so that for instance the terminal 'a' becomes $alpha_a = qr/a/ and 'b' becomes $alpha_b = qr/b/. $alpha_T = qr/ T /x;

    2.3 Non-terminals

    2.3.1 Rules

    Each rule r_i should be viewed as the tuple (N,L), where L \subset V*. To create the rule regex $rule_n you should juxtapose the letters in L like this:
    $rule_i = qr/ (??{$alpha_v1}) (??{$alpha_v2}) ... (??{$alpha_vn}) /x;

    2.3.2 Creating symbols for non-terminals

    This is the set (V-T) and they are represented by the rules R. For each non-terminal N you should connect all rules (r1..rn) where the first element is N and then construct the alternating rule:
    $alpha_N = qr/ (??{$rule_1}) | (??{$rule_2}) | ... | (??{$rule_n}) /x +;

    2.4 Start symbol

    On of the non-terminals should be named the start-symbol and get special encoding :
    $START = $alpha_N;
    making the start symbol an alias for it.

    2.5 The final regexp

    The regexp G accepting the language is simply $G = qr/ ^ (??{$START}) $ /x;

    3 An example language

    From (pg 116 in (1)):
    Note here the implied \s* after each terminal in T due to a convention in English, you put spaces between words.
     W = {S,A,N,V,P) \union T
     T = {Jim, big, green, cheese, ate}
     R = { P -> N, P -> AP, S -> PVP,     (Rules 1-3)
           A -> big, A->green,            (Rules 4-5) 
           N -> cheese, N-> jim, V-> ate} (Rules 6-8)
    Encodes into:

    # Terminals

    $alpha_Jim = qr/ Jim \s* /x; $alpha_big = qr/ big \s* /x; $alpha_green = qr/ green \s* /x; $alpha_cheese = qr/ cheese \s* /x; $alpha_ate = qr/ ate \s* /x;

    # Rules

    $rule_1 = qr/ (??{$alpha_N}) /x; $rule_2 = qr/ (??{$alpha_A}) (??{$alpha_P})/x; $rule_3 = qr/ (??{$alpha_P}) (??{$alpha_V}) (??{$alpha_P})/x; $rule_4 = qr/ (??{$alpha_big}) /x; $rule_5 = qr/ (??{$alpha_green}) /x; $rule_6 = qr/ (??{$alpha_cheese}) /x; $rule_7 = qr/ (??{$alpha_Jim}) /x; $rule_8 = qr/ (??{$alpha_ate}) /x;

    # Non-terminals

    $alpha_P = qr/ (??{$rule_1}) | (??{$rule_2}) /x; $alpha_S = qr/ (??{$rule_3}) /x; $alpha_A = qr/ (??{$rule_4}) | (??{$rule_5}) /x; $alpha_N = qr/ (??{$rule_6}) | (??{$rule_7}) /x; $alpha_V = qr/ (??{$rule_8}) /x;

    # Start and the language G defined

    $START = $alpha_S; $G = qr/ ^ (??{$START}) $ /x;


    @strings = ('Jim ate cheese','big Jim ate green cheese', 'big cheese ate Jim', 'big cheese ate green green big green big cheese', 'ate cheese','Jim ate ate', 'Jim ate big big big big chees +e'); foreach (@strings) { print "Try: $_ -- ", /$G/ ? "Accepted" : "Rejected", "\n"; }


    This is not a complete proof. I think the mapping is complete but I do not have the time at the moment to prove this. The mapping is also fairly "trivial", but maybe someone will be amused. If nothing else at how much spare time I seem to be having.

    (1) Elements of the theory of computation, Lewis and Papadimitriou

      Ah yes, and I was actually mad enough to make a webpage out of this. Amusing might be my little program that parses BNF-like syntax and creates a program that accepts strings belonging to that syntax. Oh the things we do to avoid working.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://308285]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chilling in the Monastery: (3)
As of 2021-04-12 02:49 GMT
Find Nodes?
    Voting Booth?

    No recent polls found