Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid

Re: Challenge: Fast Common Substrings

by thezip (Vicar)
on Apr 04, 2007 at 05:04 UTC ( #608206=note: print w/replies, xml ) Need Help??

in reply to Challenge: Fast Common Substrings

I throw my hat into the ring with my recursive implementation. I think it could perform reasonably well, since it is a divide-and-conquer type solution.

I'm not sure how it will stand up to the hash-based solutions, and there might be some "correctness" issues...

For the bonus, though, mine increments the count for *any* length of matching substrings.

There are probably many opportunities for optimizations here... please offer criticism.

#! perl -w use strict; my $hits = 0; sub common_substr { my($s1, $s2) = @_; print qq(s1 = $s1, s2 = $s2\n); ($s1, $s2) = ($s2, $s1) if length($s2) < length($s1); if ($s1 eq $s2) { $hits++; return if length($s1) == 1; } my %hash = map { $_ => 1 } split(//, $s1); my $arr = []; for my $s (split(//, $s2)) { push(@$arr, $s) if ! exists($hash{$s}); } my $splitters = join('|', @$arr); for my $s (split(/$splitters/, $s2)) { common_substr($s, $s1); } } common_substr("abcef", "abcdef"); print "hits = $hits\n";

Where do you want *them* to go today?

Replies are listed 'Best First'.
Re^2: Challenge: Fast Common Substrings
by thezip (Vicar) on Apr 04, 2007 at 20:45 UTC

    OK, I concede to the Suffix Tree solution presented by lima1 ++.

    I suspect the best run-time order I could muster is O(nlogn), and worst O(n²).

    It was still a fun diversion nonetheless... :-)

    Where do you want *them* to go today?

      I wouldn't give up yet. You need to see an implementation and how it benchmarks.

      Implementing Suffix Trees in pure perl is messy, memory expensive (hugely so in my attempts), and rarely works out more efficient. Don't forget that it's O(n) to build as well as O(m) to use. If you are only using it once, that snips into the benefits.

      For short strings a linear search (at C speed) and no build time wins easily. For longer strings, the indexing a suffix tree provides is a winner, but it gets set back awfully by the build time and memory management. If you only use the tree the once (as for this challenge), there little or no win.

      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://608206]
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others cooling their heels in the Monastery: (6)
As of 2018-03-18 16:43 GMT
Find Nodes?
    Voting Booth?
    When I think of a mole I think of:

    Results (230 votes). Check out past polls.