http://www.perlmonks.org?node_id=111640


in reply to Pattern Finding

Yow. I'm not sure who's crazier - you for suggesting this might be something one would want to do, or me for trying to do it ;)

Because It's There, as the man said.

Having struggled with it a bit I realised one thing about the question itself, which is that we can't say there are only three patterns. In fact there are a lot more - "hell", "hel", and "he" to name but the most obvious additions. That's unless we want to match against a dictionary, in which case it's just a matter of processing power.

Assuming we are interested in patterns rather than specific words I think the following does it. I should say at the outset that the clever bit in this comes from japhy's regex book which is referred to in this node.
my $string = "helloworldhellohellohihellohiworld"; my $length = length $string; my $window = int (($length - 2) / 2); # use japhy's regex to hoover up all char # sequences that MIGHT be patterns: my @pats; my $regex; while ($window > 1) { $regex = '(?=(' . '.' x $window . '))'; push @pats, ($string =~ /$regex/g); $window --; } # now go through @pats to find the duplicates # and print the final result @pats = sort @pats; my %dups; for (2 .. $#pats) { $dups{$pats[$_]} ++ if ($pats[$_] eq $pats[$_ - 1]) } $dups{$_} ++ for keys %dups; for (keys %dups) {print $dups{$_},' occurrences of "',$_,'"',"\n"}
This throws up 31 patterns, with up to four occurrences each. (BTW, in case $window doesn't make sense, I assumed (A) there must be at least two occurrences of each pattern, otherwise it wouldn't really be a pattern; (B) each pattern must be at least 2 chars and (C) there must be at least 2 patterns.)

Thanks for making me think. Can I stop now?

§ George Sherston