Being Forced to Fork with Nested Regular Expressions

Ovid has asked for the wisdom of the Perl Monks concerning the following question:

A few days ago I started to again work on implementing predicate logic in Perl. I know it can be done. I just need that epiphany that makes me slap my forehead. I stopped working on the latest implementation because it was awful, but I kept thinking about the problem. Basically, to get anything similar to Prolog in Perl, I need backtracking. The only thing in Perl which implements backtracking is a regular expression. I'm pretty sure the key lies there.

This morning, I thought I had the epiphany. What if, given a set of facts, I would like to know who owns gold?

my $facts = <<END_FACTS;
owns merlyn gold
owns Ovid books
gives Ovid books kudra
owns kudra gold
END_FACTS
                                                                      
+                                                                     
+                  
my $query = qr/owns (\S+) gold/;
my @who = $facts =~ /$query/mg;
[download]

That assigns "merlyn" and "kudra" to @who, but it's not really useful. What if I want the query to be something like "who owns something valuable?" I can do this:

use Data::Dumper;
use constant SUCCEED => qr{(?=)};
use constant FAIL    => qr{(?!)};                                     
+                                                                     
+                                                  

my $owns = <<END_FACTS;
owns merlyn gold
owns Ovid books
owns kudra gold
END_FACTS
                                                                      
+                                                                     
+                 
my $valuable = <<END_VALUABLE;
valuable Perl
valuable gold
valuable Corinna
END_VALUABLE                                                          
+                                                                     
+                             

my $is_valuable = qr[(\S+)(??{match($valuable, "valuable $2")})];
my $owns_query = qr/owns (\S+) $is_valuable/;

# here's the magic!
my @who = $owns =~ /$owns_query/mg;

print Dumper \@who;                                                   
+                                                                     
+                                    

sub match {
    my ($facts, $query) = @_;
    my $sleep_count = 0;
    my $pid;
    do {
        $pid = open KID_TO_READ, "-|";
        unless (defined $pid) {
            die "Couldn't fork" if $sleep_count++ > 6;
            sleep 1;
        }
    } until defined $pid;                                             
+                                                                     
+                                          

    unless ($pid) { # child
        if ($facts =~ $query) {
            print SUCCEED;
        }
        else {
            print FAIL;
        }
    }
    else { # parent
        my $result = <KID_TO_READ>;
        close KID_TO_READ;
        return $result;
    }
}
[download]

The prints out:

$VAR1 = [
          'merlyn',
          'gold',
          'kudra',
          'gold'
        ];
[download]

That's great. Once I can autogenerate those regexes and clean up the output I can wrap this up in a pretty framework and do all of the predicate logic programming I want.

Uh, no I can't. Regular expressions are not reentrant, thus forcing me to fork off a new process for every regex "nested" inside of the main query (such as the "is_valuable" condition). This winds up being so ridiculously slow that it's probably not worth much more than a toy. I also suspect that a complicated query will suck memory like a Hoover. Any suggestions to get around this, or am I stuck?

Cheers,
Ovid

New address of my CGI Course.

Comment on Being Forced to Fork with Nested Regular Expressions Select or Download Code

Replies are listed 'Best First'.
Re: Being Forced to Fork with Nested Regular Expressions by paulbort (Hermit) on Jan 02, 2004 at 18:00 UTC
I know that it really doesn't address your topic, but I think the reason that you're running into the problem is that it's the wrong approach. An impressive (ab)use of regular expressions, though. What I've seen of predicate systems leads me to a database approach rather than regular expressions. Given the right data structure, you need to write two parsers: one to convert $facts into rows in tables, and one to convert your queries into SQL. With the right data structure, this is straightforward. Most of this can be done with three tables: One to contain people, things and attributes (Merlyn, Ovid, Kudra, books, gold, valuable), one to contain verbs ( owns, is ), and one to contain tuples of person/thing/attribute, verb, person/thing/attribute. This leads to something like: `SELECT actor FROM relations WHERE verb = 'owns' AND acted_on IN ( SELECT actor FROM relations WHERE verb = 'is' AND acted_on = 'valuable' );` [download] This query, with proper data, can answer the question of 'who owns something valuable' in a very reasonable amount of time. You still do code generation, but now it's SQL instead of RE. HTH, Paul -- Spring: Forces, Coiled Again!	[reply] [d/l]
Re: Re: Being Forced to Fork with Nested Regular Expressions by Ovid (Cardinal) on Jan 02, 2004 at 18:11 UTC
With DBD::SQLite, this might be a viable option if I can figure out how to structure this. I've seen this discussed before and, if it works, I can push the hard part of matching down to the C level and get a huge performance boost. I'll have to see how robust the SQLite SQL implementation is. As I recall, it's pretty good (and here's an instance where I think a typeless database like SQLite is fine). Cheers, Ovid New address of my CGI Course.	[reply]
Re: Re: Re: Being Forced to Fork with Nested Regular Expressions by paulbort (Hermit) on Jan 02, 2004 at 20:43 UTC
Yeah, typeless is no problem in this case. I don't know SQLite at all, but as long as it can do sub-selects ( the IN ( SELECT ...) part ), it should be fine. I'm not sure when you say 'structure this' if you mean the database, or the Perl code. The Perl code probably needs to be set up as a shell, where you can make assertions (which will be written to the database) ask questions (which will query the database), and do housekeeping (save db, restore db, wipe db, quit, etc.) Like a PROLOG interpreter. The only interesting thing I'd worry about with the database to start with would be using serial numbers instead of names when describing relationships, so that you can rename things if necessary. (Without this, renaming breaks all relationships.) Next might include quantities ( book worth 5 shekels ). Note that you don't need to add classes to the database, you can simply add the 'is a' relationship, and ( 1984 is a book ) works. -- Spring: Forces, Coiled Again!	[reply]
Re: Re: Being Forced to Fork with Nested Regular Expressions by Ovid (Cardinal) on Jan 02, 2004 at 21:41 UTC
A bit of thought tells me that there are some problems with this approach. Basically, you have a main table as follows: `+----------+ \| Facts \| +----------+ \| verb \| \| subject \| \| object \| +----------+` That limits me to simple things such as "owns Ovid gold". However, what if I want to express "gives Ovid books kudra"? In this case, there's an implied prepositional phrase "to kudra" that doesn't fit and this tremendously limits the utility of this approach. My other option would be to create a recursive tree structure with arbitrary depth. I've done that before and it's not fun, but I can see no other way to approach this. Cheers, Ovid New address of my CGI Course.	[reply]
Re: Re: Re: Being Forced to Fork with Nested Regular Expressions by paulbort (Hermit) on Jan 05, 2004 at 17:46 UTC
Your OP didn't mention prepositions, so I didn't even think about them. At that point you're really looking for something more like a natural language parser. A quick wander through CPAN finds Lingua::EN::Tagger which might help. The post giving an example of a Prolog interface from Perl is also worth looking at, as all the heavy lifting has already been done in Prolog. Something else that might help would be to look at the MUSH code, which had a rich object-oriented environment, and could at least store the kind of state information you're describing, and much more. The queries would be interesting. PennMUSH seems to be the current MUSH implementation of choice. -- Spring: Forces, Coiled Again!	[reply]
Re: Being Forced to Fork with Nested Regular Expressions by adrianh (Chancellor) on Jan 02, 2004 at 17:31 UTC
Any suggestions to get around this, or am I stuck? With regexes I think you'll remain stuck. You'll also restrict yourself to assertions about simple strings which I imagine you'll quickly find restrictive. Personally I think using the call stack to implement your backtracking will be a lot easier - but I would say that wouldn't I :-)	[reply]
Re: Re: Being Forced to Fork with Nested Regular Expressions by Ovid (Cardinal) on Jan 02, 2004 at 18:07 UTC
I once tried to write a generic wrapper using your code, but I failed. I probably just didn't take enough time to play with it, but that was about the point where I stopped working on this last time. Cheers, Ovid New address of my CGI Course.	[reply]
Re^3: Being Forced to Fork with Nested Regular Expressions by adrianh (Chancellor) on Jan 16, 2004 at 22:46 UTC
Somewhat belated reply ;-) Even if you don't use the call stack, I think that the regex engine is probably the wrong choice - it's just not made to be used that way easily (as you've found out :-) Implementing backtracking isn't really that hard. All you really need is a stack. I'd start there if it was me.	[reply]
Re: Being Forced to Fork with Nested Regular Expressions by halley (Prior) on Jan 02, 2004 at 18:12 UTC
I also wanted to abuse the regex system with twelve-level-deep `s///e` recursion to implement my Pentominos Solving Quine, but instead found that perl would crash horribly if it tried to recurse there. `for $pattern (@patterns) { while ($data =~ m/(?=$pattern)/g) { do_recurse($data, pos($data), $pattern); # $data is the source # pos($data) is the found place in the souce ++pos($data); } }` [download] The alternative method uses `pos()` as an lvalue and `m/(?=)/g` as in the loop above, to allow me to iterate through matches yet not recurse via any regex code-evaluating trick. I do recurse, but not by getting the regex to do the call. -- `[ e d @ h a l l e y . c c ]`	[reply] [d/l] [select]
Re: Being Forced to Fork with Nested Regular Expressions by Aristotle (Chancellor) on Jan 02, 2004 at 17:41 UTC
What puzzles me the most about the question is why you'd specifically want to use the regex engine for this problem. It doesn't seem like a really bad fit, but it's not a particularly good one either. I'd be more inclined to use some multilevel hash entanglement to attack this problem domain. Makeshifts last the longest.	[reply]
Re: Re: Being Forced to Fork with Nested Regular Expressions by Ovid (Cardinal) on Jan 02, 2004 at 18:09 UTC
I've certainly tried a multi-level hash structure, but I started to get too complicated. When you say "entanglement", though, it makes me think you are implying superpositions. I've never actually tried that route. Very interesting idea, though. Cheers, Ovid New address of my CGI Course.	[reply]
Re^3: Being Forced to Fork with Nested Regular Expressions by Aristotle (Chancellor) on Jan 02, 2004 at 21:07 UTC
Actually that didn't even occur to me. I just meant multiple hashes forming a crossreference mesh. But if the Q::SP route seems viable, why not.. :) Makeshifts last the longest.	[reply]
Re: Being Forced to Fork with Nested Regular Expressions by diotalevi (Canon) on Jan 04, 2004 at 01:12 UTC
I wrote Regexp::Approx - Use fuzzy regular expressions (and forgot to submit it to CPAN) to solve the re-entrant regex problem for my specific problem. I forked during INIT, kept some file handles pointing to and from the child and then would send all the re-entrancy-requiring work to the child. You could pre-fork a number of children and then create more as needed. Alternatively, you could create a backtracking module for us and allow us to stop abusing regexes just to get at an already written engine.	[reply]
Re: Being Forced to Fork with Nested Regular Expressions by mnc (Beadle) on Jan 05, 2004 at 05:18 UTC
This my $code = <<END_CODE; owns(merlyn,gold). owns(ovid,books). owns(kudra,gold). valuable(perl). valuable(gold). valuable(corinna). query :- owns(X,Y), valuable(Y), writef('["%t", "%t"], ',[X,Y]). main :- writef('[ '), bagof(_,query,_), writef(']\n'). END_CODE my $tmpfile = "deleteme.pro"; open(TMP,">$tmpfile"); print TMP $code; close(TMP); my $result = `pl -q -f deleteme.pro -t main`; unlink($tmpfile); print $result; [download] yields this `[ ["merlyn", "gold"], ["kudra", "gold"], ]` [download] using SWI-Prolog. Perl is great at generating code, handing it to an industrial-strength backtracking engine, and scraping the result (though there is also Language::Prolog::Yaswi et.al.).	[reply] [d/l] [select]

Back to Seekers of Perl Wisdom