Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

Re: Challenge: Fast Common Substrings

by BrowserUk (Pope)
on Apr 04, 2007 at 02:54 UTC ( #608189=note: print w/ replies, xml ) Need Help??


in reply to Challenge: Fast Common Substrings

Here's my best attempt so far:

sub nCommonSubstrLenL { my( $haystack, $needle, $len ) = @_; ( $haystack, $needle ) = ( $needle, $haystack ) if length( $haystack ) < length( $needle ); # Added my $count = 0; my %possibles; for my $ni ( 0 .. length( $needle ) - $len ) { my $possible = substr( $needle, $ni, $len ); next if ++$possibles{ $possible } > 1; ++$count if 1+index $haystack, $possible; } return $count; }

Update: A slightly faster reformulation. Updated again to work for lengths other than 2.

sub nCommonSubstrLenL2 { my( $haystack, $needle, $len ) = @_; ( $haystack, $needle ) = ( $needle, $haystack ) if length( $haystack ) < length( $needle ); ## Added. # my $pattern = "A$len X" x int( length( $needle ) / $len ); my $pattern = (" A$len" . 'X' x ( $len-1 )) x (length( $needle ) - + $len +1); my $count = 0; my %possibles; for my $possible ( unpack $pattern, $needle ) { next if ++$possibles{ $possible } > 1; ++$count if 1+index $haystack, $possible; } return $count; }

And buking for the bonus, a not well tested version that looks for all common substring equal or greater in length than the user specifed parameter:

sub nCommonSubstrGreaterLenL { my( $haystack, $needle, $len ) = @_; my $count = 0; my %possibles; for my $l ( $len .. length( $needle ) ) { my $sofar = $count; for my $ni ( 0 .. length( $needle ) - $l ) { my $possible = substr( $needle, $ni, $l ); next if ++$possibles{ $possible } > 1; ## print( "$possible : $count" ), ++$count if 1+index $haystack, $possible; } last unless $count > $sofar; } return $count; } print 'Buk:', nCommonSubstrGreaterLenL 'ABCDEF','ABDEFCBDEAB', 2; __END__ AB : 0 DE : 1 EF : 2 DEF : 3 Buk:4

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.


Comment on Re: Challenge: Fast Common Substrings
Select or Download Code

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://608189]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others pondering the Monastery: (9)
As of 2015-07-29 05:59 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (260 votes), past polls