Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

Re: Challenge: Fast Common Substrings

by Moron (Curate)
on Apr 04, 2007 at 10:01 UTC ( #608236=note: print w/ replies, xml ) Need Help??


in reply to Challenge: Fast Common Substrings

# more update: this is the tested version now!! sub CountSubstrings { # $string1, $string2, $substr_length my $l = pop @_; my %found = (); my %match = (); my $first = 2; my $string1 = $_[0]; for my $string ( @_ ) { $first --; my $ls = length( $string ); my $limit = $ls - $l + 1; for ( my $i = 0; $i < $limit; $i++ ) { my $sbstr = substr( $string, $i, $l ); $first or defined ( $match{$sbstr} ) && next(); $found{ $sbstr }{ $string } ||= 1; $first and next;; defined ( $found{ $sbstr }{ $string1 } ) and $match{ $sbstr } = 1; } } Count( keys %match ); } sub Count { 1+$#_ };
Description of alggorithm: Store all substrings of string 1 of the given length in subhash $found{ $string1 }; do the same for string 2 but in addition lookup in the first hash and record any matches in the %match hash. The suggested opportunities for optimisation include (a) only need to check matches for the substring iteration of the second string. (b) only need to build the substring for string 2 if there is no match found in the match hash.

-M

Free your mind


Comment on Re: Challenge: Fast Common Substrings
Download Code
Re^2: Challenge: Fast Common Substrings
by BrowserUk (Pope) on Apr 04, 2007 at 10:25 UTC
    # untested

    True. It doesn't even compile. And when you've corrected the numerous errors, it doesn't work. And it's hard to see how you thought it would. Care to enlighten? (And did you notice the word "fast" in the title?)


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      I think many people will see this is only an algorithm outline and will take the word "untested" as an approriate hint in that regard. I am multitasking at the moment.

      re "enlighten" - I think under the circumstances the most appropriate enlightment I can offer is to improve your lateral thinking In particular I recommend Six Thinking Hats. You would benefit from the insight that you tend to be stuck wearing a black hat (negative logic) and need to learn how to take it off at will and replace it with other forms of thinking when needed. If you need evidence - In your post you say on the one hand you find it difficult to see how I expect it to work and yet you go on to imply an ability to assess its performance when corrected. That's like saying the performance of anything in a category is X without realising the scope you have already accepted for that category, making it impossible to make any sensible such judgement. Conversely, all I am offering is a means to use hashes to solve the problem and I sincerely expect this to perform well when corrected given the pure simplicity of the algorithm and the opportunities it uses to cut out unnecessary iteration. I think you will find that many beginners out there will find it easy enough to correct this code and get a fast-performing result. So why not you, eh? You're clearly not nearly a beginner are you. So what else can it be ... ?

      -M

      Free your mind

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://608236]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others romping around the Monastery: (2)
As of 2014-08-31 03:28 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The best computer themed movie is:











    Results (294 votes), past polls