Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

Longest common substring

by japhy (Canon)
on Feb 15, 2002 at 03:26 UTC ( #145608=snippet: print w/replies, xml ) Need Help??
Description: To find the longest substring shared by two strings.
sub longest_common_substr {
  # provided you know there are no NULs
  my $str = join "\0", @_;
  my $len = 1;
  my $match;

  while ($str =~ m{ ([^\0]{$len,}) (?= [^\0]* \0 [^\0]*? \1 ) }xg) {
    $len = length($match = $1) + 1;
  }

  return $match;
}
Replies are listed 'Best First'.
Re: Longest common substring
by blakem (Monsignor) on Feb 16, 2002 at 00:29 UTC
    At the risk of getting rebuked again while commenting on a japhy regex ;-P

    Overlapping matches seem to cause a problem... For example, the last four characters in 'abcabc' and 'caWcabc' match, yet the function only returns the last three.

    print longest_common_substr('abcabc','caWcabc'); # 'abc' not 'cabc'
    I think it might be as simple as removing the /g (but thats the part I don't fully comprehend....) The code forces each match to be bigger than the previous one, with greedy matching helping us rachet up several steps at a time. The /g might be doing something else tricky, but I don't see it.

    Update: Other word pairs that fail similarly are sense/tense and onion/union.

    -Blake

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: snippet [id://145608]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others taking refuge in the Monastery: (7)
As of 2020-11-24 10:18 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?