timothy has asked for the wisdom of the Perl Monks concerning the following question:
|
---|
Replies are listed 'Best First'. | |
---|---|
Re: Perl and Context Free Grammar
by Abigail-II (Bishop) on Nov 19, 2003 at 13:02 UTC | |
Abigail | [reply] |
by gjb (Vicar) on Nov 19, 2003 at 13:45 UTC | |
Please make a distinction between regular expressions as known in computer science literature and Perl regular expressions. I know someone like you knows, but many people could get very wrong ideas from a statement such as the above. Regular expressions in the computer science sense have a subset of the operators Perl regular expressions have, i.e. concatenation, union, kleene star. Regular expressions describe regular languages, context free grammars describe context free languages and it is known (and fairly easy to prove) that reguular languages are a proper subset of context free languages. Hence you can't parse context free language with a regular expression unless it happens to be a regular language. Perl regular expressions are more powerful than computer science regular expressions since they've features such as capturing and \1, zero-width assertions and code embedding. It is indeed an open problem what the precise expressive power is. Sorry for this piece of pedantry, but IMHO it's an important point to make when addressing a very general audience. Just my 2 cents, -gjb- | [reply] [d/l] |
by Anonymous Monk on Nov 19, 2003 at 14:20 UTC | |
| [reply] |
by Abigail-II (Bishop) on Nov 19, 2003 at 15:00 UTC | |
Abigail | [reply] [d/l] |
by dref (Novice) on Nov 20, 2003 at 21:05 UTC | |
It should be noted that this parses the grammar {a^nb^nc^n | n >= 1) which is beyond the capability of a CFG. Note that if we allow ourselfs to move into "perlspace" we can just say: Altough this says nothing, since perl is turing complete (since Acme::Ook exists this is known through the proof that brainfuck is :)) But what if we don't want to allow ourself this? This would still use ??{}. Assuming we allow only a simple recursive regex and not a full perl statement, once again to avoid moving the parser into "perl-space". Let make an attempt: 1. A pushdown automata equivalenceIn this case we should settle for trying to establish an equivalence with a pushdown automata. To do this it would be enough to establish ourselfs as equivalents of a CFG. This means that we should be able to parse any Context-free grammar G. A formal definition(1):G = (V,T,R,S) where: (T is \Gamma). V - An alphabet (finite set) T - Terminals (subset of V) R - Rules a subset of (V-T)xV* and S - Startsymbol, an element of (V-T) 2. Encoding of a general PDA into a perl regex.2.1 AssumptionsAssume w.l.o.g that V = \w+.2.2 Terminals2.2.1 Creating symbols for terminalsTerminals T are actual 'strings' and can be encoded as trivial regexps. They should be named $alpha_T so that for instance the terminal 'a' becomes $alpha_a = qr/a/ and 'b' becomes $alpha_b = qr/b/. $alpha_T = qr/ T /x;2.3 Non-terminals2.3.1 RulesEach rule r_i should be viewed as the tuple (N,L), where L \subset V*. To create the rule regex $rule_n you should juxtapose the letters v1..vn in L like this:
2.3.2 Creating symbols for non-terminalsThis is the set (V-T) and they are represented by the rules R. For each non-terminal N you should connect all rules (r1..rn) where the first element is N and then construct the alternating rule:
2.4 Start symbolOn of the non-terminals should be named the start-symbol and get special encoding :making the start symbol an alias for it. 2.5 The final regexpThe regexp G accepting the language is simply $G = qr/ ^ (??{$START}) $ /x;3 An example languageFrom (pg 116 in (1)):Note here the implied \s* after each terminal in T due to a convention in English, you put spaces between words. W = {S,A,N,V,P) \union T T = {Jim, big, green, cheese, ate} R = { P -> N, P -> AP, S -> PVP, (Rules 1-3) A -> big, A->green, (Rules 4-5) N -> cheese, N-> jim, V-> ate} (Rules 6-8)Encodes into: # Terminals
# Rules
# Non-terminals
# Start and the language G defined
#Tests
ConclusionThis is not a complete proof. I think the mapping is complete but I do not have the time at the moment to prove this. The mapping is also fairly "trivial", but maybe someone will be amused. If nothing else at how much spare time I seem to be having.(1) Elements of the theory of computation, Lewis and Papadimitriou | [reply] [d/l] [select] |
by dref (Novice) on Nov 28, 2003 at 11:47 UTC | |
| [reply] |
Re: Perl and Context Free Grammar
by blokhead (Monsignor) on Nov 19, 2003 at 14:11 UTC | |
There are better algorithms out there -- A quick googling found this paper which does a good job of explaining why the above method isn't that great, and presents a much better algorithm. Either of these may make a good starting point for your project. blokhead | [reply] [d/l] |
Re: Perl and Context Free Grammar
by gjb (Vicar) on Nov 19, 2003 at 14:56 UTC | |
I just remembered that I actually have Perl code doing this. It's from a research project, so there's not much comment and it's dashed off quickly so I'm sure it can be optimized a lot. The code is intended to generate all strings described by the CFG upto a given length. In your case, you'll probably want to include some probabilities when choosing alterantives. Hope this helps, -gjb- Read more... (4 kB)
| [reply] [d/l] |
Re: Perl and Context Free Grammar
by Molt (Chaplain) on Nov 19, 2003 at 13:36 UTC | |
| [reply] |
Text generation with Perl
by TheDamian (Vicar) on Nov 19, 2003 at 19:22 UTC | |
| [reply] |
Re: Perl and Context Free Grammar
by mattr (Curate) on Nov 20, 2003 at 06:32 UTC | |
Certainly the links other people have posted are probably what you wanted, but maybe there is some more information in here. Unfortunately I don't have a degree in computational linguistics, though lately I think it would be nice to have! I've been scouring the web for natural language processing tools to use along with perl for real-world application. Possibly these links may be of use, at least in my limited understanding some links I saw which say they use lexical functional grammars in text generation may be pertinent. A list of projects for algorithmic sentence generation I came across yesterday. Some detective work required. A number of links under shallow and deep text generation (not from a CFG exactly) at the Natural Language Software Registry. FGW (Functional Grammar Workbench). functional grammar homebase and links to functional discourse grammar. FUF / SURGE system as mentioned on thispage. From Ben Gurion, and listed in the NLSR above. A short, basic page for other people, with pseudocode discussing the generation of strings of terminal symbols from a CFG. Matt R. | [reply] |