Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

Re: Re: Pattern Finding

by runrig (Abbot)
on Sep 13, 2001 at 03:20 UTC ( #112057=note: print w/ replies, xml ) Need Help??


in reply to Re: Pattern Finding
in thread Pattern Finding

Here is something that I believe almost satisfies your requirements (just needs a bit more work which I'm not ready to do at the moment, and its not thoroughly tested). It doesn't do very well without a min and max length for each pattern, so maybe if this was wrapped in a sub which adjusted the min and max to various sizes, and evaluated the results on each pass by some heuristic, it could do fairly well with all of the requirements (and that'll have to wait 'till later):

use warnings; use strict; # Min and max length for each pattern my $min = 2; my $max = 8; # Number of patterns my $num = shift; # Generate pattern to capture words my $words = join ('', map { "(.{$min,$max})" . "(?:" . join( '|', map("\\$_", 1..$_)) . ")*" } 1..$num); $_="bookhelloworldhellohellohihellohiworldhihelloworldhihellobookpenbo +okpenworld"; if (my @pats = /^$words$/) { for my $pat (@pats) { print "[$pat]\n"; } } stan:~/tmp >./tst 5 [book] [hello] [world] [hi] [pen]
Update: Greatly simplified. Wondering if I'm doing someone's homework. Noticed that its very similar to nardo's approach, but cleaner, I think, and slightly different behavior due to the newest problem definition. Great minds think alike :)


Comment on Re: Re: Pattern Finding
Download Code
Re: Re: Re: Pattern Finding
by Anonymous Monk on Sep 13, 2001 at 19:15 UTC
    Hi, This is one of the classic problem in AI.

    The problem I posted, is actaully an exercise on segmentation section of the OpenLab on http://www.a-i.com.(You will need to register) I have extended it to some other critera such as 'spaces allowed' to meet more general problems. I tried runrig's solution and it doesn't work when number of patterns is 6, for the condition that one pattern cannot be part of another pattern.

    I am trying to solve this problem myself also, what I am looking for is good design to begin with.

    Artist.
    (My computer doesn't keep the login for more than one page, Please let me know if you know the soltuion).

      for the condition that one pattern cannot be part of another pattern

      This is the toughest condition, and so I don't think it can be done with a regex, at least not with perl's regex engine (hope someone can prove me wrong :). At every stage of capuring a pattern, you'd have to be able to fail if the longer of the current pattern and each of all past patterns doesn't contain the other. Here's some psuedo perl regex code which, if it worked would accomplish this (hope you get the idea), but I'm using things in the wrong way, the regex engine isn't re-entrant, it uses "$1" instead of "\1" (and in a symbolic reference sort of way), etc, but I though it was interesting nonetheless. It would go right after each pattern caputure in my solution:

      join('', map { "(?{(length($$i)>length($$_))$$i !~ /$$_/ | $$_ !~ /$$i/})" } 1..$_)

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://112057]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others cooling their heels in the Monastery: (6)
As of 2014-08-31 03:46 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The best computer themed movie is:











    Results (294 votes), past polls