Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

Re: Challenge: Fast Common Substrings

by BrowserUk (Pope)
on Apr 04, 2007 at 02:54 UTC ( #608189=note: print w/ replies, xml ) Need Help??


in reply to Challenge: Fast Common Substrings

Here's my best attempt so far:

sub nCommonSubstrLenL { my( $haystack, $needle, $len ) = @_; ( $haystack, $needle ) = ( $needle, $haystack ) if length( $haystack ) < length( $needle ); # Added my $count = 0; my %possibles; for my $ni ( 0 .. length( $needle ) - $len ) { my $possible = substr( $needle, $ni, $len ); next if ++$possibles{ $possible } > 1; ++$count if 1+index $haystack, $possible; } return $count; }

Update: A slightly faster reformulation. Updated again to work for lengths other than 2.

sub nCommonSubstrLenL2 { my( $haystack, $needle, $len ) = @_; ( $haystack, $needle ) = ( $needle, $haystack ) if length( $haystack ) < length( $needle ); ## Added. # my $pattern = "A$len X" x int( length( $needle ) / $len ); my $pattern = (" A$len" . 'X' x ( $len-1 )) x (length( $needle ) - + $len +1); my $count = 0; my %possibles; for my $possible ( unpack $pattern, $needle ) { next if ++$possibles{ $possible } > 1; ++$count if 1+index $haystack, $possible; } return $count; }

And buking for the bonus, a not well tested version that looks for all common substring equal or greater in length than the user specifed parameter:

sub nCommonSubstrGreaterLenL { my( $haystack, $needle, $len ) = @_; my $count = 0; my %possibles; for my $l ( $len .. length( $needle ) ) { my $sofar = $count; for my $ni ( 0 .. length( $needle ) - $l ) { my $possible = substr( $needle, $ni, $l ); next if ++$possibles{ $possible } > 1; ## print( "$possible : $count" ), ++$count if 1+index $haystack, $possible; } last unless $count > $sofar; } return $count; } print 'Buk:', nCommonSubstrGreaterLenL 'ABCDEF','ABDEFCBDEAB', 2; __END__ AB : 0 DE : 1 EF : 2 DEF : 3 Buk:4

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.


Comment on Re: Challenge: Fast Common Substrings
Select or Download Code

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://608189]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others surveying the Monastery: (10)
As of 2014-12-26 04:30 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (165 votes), past polls