Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

Re: Pattern Finding

by tachyon (Chancellor)
on Sep 12, 2001 at 09:16 UTC ( #111853=note: print w/ replies, xml ) Need Help??


in reply to Pattern Finding

Here is my effort. It finds all the patterns and also does a quick dictionary lookup for real words. If you don't have a dictionary text file you can select from a wide variety at the National Puzzlers' League -- Word Lists

my $str ='helloworldhellohellohihellohiworld'; my %hash; # grab all the substrings 2+ chars and count occurences in a hash for my $i ( 0 ..(length($str) -1) ) { $hash{ substr($str, $i, $_) }++ for 2.. (length($str) - $i); } # sort on occurences and then in alphabetical order # only select elements that occur >1 times using grep my @order = sort { $hash{$b} <=> $hash{$a} || $a cmp $b } grep { $hash{$_} > 1 } keys %hash; print "\nPatterns found:\n\n"; print "$hash{$_} occurrences of $_ \n" for @order; # now grab dictionary file into a hash # you can save memory using Search::Dict open DICT, "c:/windows/desktop/dict.txt" or die $!; while ($word = <DICT>) { chomp $word; $dict{$word}++; } @words = grep { defined $dict{$_} } @order; print "\nReal words found in dictionary:\n\n"; print "$hash{$_} occurrences of $_ \n" for @words; # remove substrings of larger words @words = sort { length $b <=> length $a } @words; for $i ( 0 .. $#words - 1 ) { for $j ( $i + 1 .. $#words ) { $hash{$words[$j]} = 0 if $words[$i] =~ m/\Q$words[$j]/ and $ha +sh{$words[$i]} == $hash{$words[$j]}; } } # regenerate sort order grepping out unwanted substrings (set occurenc +es to zero above) @words = sort { $hash{$b} <=> $hash{$a} || $a cmp $b } grep { $hash{$_} } @words; print "\nBest Matches:\n\n"; print "$hash{$_} occurrences of $_ \n" for @words; __END__ # sample output Patterns found: 4 occurrences of el 4 occurrences of ell 4 occurrences of ello 4 occurrences of he 4 occurrences of hel 4 occurrences of hell 4 occurrences of hello 4 occurrences of ll 4 occurrences of llo 4 occurrences of lo 3 occurrences of elloh 3 occurrences of helloh 3 occurrences of lloh 3 occurrences of loh 3 occurrences of oh 2 occurrences of ellohi 2 occurrences of hellohi 2 occurrences of hi 2 occurrences of ld 2 occurrences of llohi 2 occurrences of lohi 2 occurrences of ohi 2 occurrences of or 2 occurrences of orl 2 occurrences of orld 2 occurrences of rl 2 occurrences of rld 2 occurrences of wo 2 occurrences of wor 2 occurrences of worl 2 occurrences of world Real words found in dictionary: 4 occurrences of el 4 occurrences of ell 4 occurrences of he 4 occurrences of hell 4 occurrences of hello 4 occurrences of lo 3 occurrences of oh 2 occurrences of hi 2 occurrences of or 2 occurrences of wo 2 occurrences of world Best Matches: 4 occurrences of hello 3 occurrences of oh 2 occurrences of hi 2 occurrences of world

cheers

tachyon

s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print


Comment on Re: Pattern Finding
Download Code

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://111853]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others contemplating the Monastery: (7)
As of 2015-07-04 05:44 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (57 votes), past polls