Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

Re: finding longest common substring

by davido (Cardinal)
on Nov 20, 2003 at 05:23 UTC ( #308504=note: print w/replies, xml ) Need Help??


in reply to finding longest common substring

I haven't seen this method posted yet. It doesn't get fancy with regexps, but is fairly clear and simple to understand. Like the rest, scalability is an issue.

use strict; use warnings; my @first = qw/ short bigger longest superbig /; my @second = qw/ short longer even longer longest /; my $string = ""; foreach my $item ( @first ) { $string = $item if length $item > length $string and grep { $item eq $_ } @second; } print $string, "\n";

One way to improve the algorithm to scale better may be to keep the arrays ordered in decending order of length so that you could just stop searching on the first match. That would require more overhead at "insertion" time, but much less at searching time.


Dave


"If I had my life to live over again, I'd be a plumber." -- Albert Einstein

Replies are listed 'Best First'.
Re: Re: finding longest common substring
by revdiablo (Prior) on Nov 20, 2003 at 05:53 UTC

    I haven't looked at your solution in depth, but upon initial inspection it seems to do the same thing as pizza_milkshake's at Re: finding longest common substring. I must admit that -- at least to my eyes -- your version looks cleaner and is more understandable than his, but it doesn't seem like either of these solve the problem I was asking about. Perhaps my quick summary of the problem was unclear, or perhaps I am simply missing something. I was looking for the longest substring that is common to all elements of the list. In my example, I have 3 lists, but each one is analyzed independently. Both of your solutions seem to be searching two lists in parallel for the longest matching element. Maybe I'm just not looking at them from the right perspective?

      :) Ok, now that I better understand what you're asking, here's my regex solution. I think that this improves upon some other methods by starting with the longest and working down to the shortest, (quitting as soon as a match is found).

      use strict; use warnings; my @array = ( "this is a string is", "a string this is", "tie a string together" ); my $string = join "|", @array; my $repeat = '\|.*?\1.*?' x ( scalar(@array) - 1 ); for my $n ( reverse ( 1 .. length( $string ) ) ) { next unless $string =~ /(.{$n}).*?$repeat/; print $1, "\n"; last; }

      Ob-Update: This assumes the substrings don't overlap. And I've tinkered with the code to make it keep track of what the original substrings looked like. In so doing, I used the | character as a delimiter, which means it shouldn't appear in the original substrings. Thanks danger for the polite nudge. ;)


      Dave


      "If I had my life to live over again, I'd be a plumber." -- Albert Einstein

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://308504]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others studying the Monastery: (3)
As of 2020-11-30 04:59 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?