Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW

Longest common substring

by japhy (Canon)
on Feb 15, 2002 at 03:26 UTC ( #145608=snippet: print w/replies, xml ) Need Help??
Description: To find the longest substring shared by two strings.
sub longest_common_substr {
  # provided you know there are no NULs
  my $str = join "\0", @_;
  my $len = 1;
  my $match;

  while ($str =~ m{ ([^\0]{$len,}) (?= [^\0]* \0 [^\0]*? \1 ) }xg) {
    $len = length($match = $1) + 1;

  return $match;
Replies are listed 'Best First'.
Re: Longest common substring
by blakem (Monsignor) on Feb 16, 2002 at 00:29 UTC
    At the risk of getting rebuked again while commenting on a japhy regex ;-P

    Overlapping matches seem to cause a problem... For example, the last four characters in 'abcabc' and 'caWcabc' match, yet the function only returns the last three.

    print longest_common_substr('abcabc','caWcabc'); # 'abc' not 'cabc'
    I think it might be as simple as removing the /g (but thats the part I don't fully comprehend....) The code forces each match to be bigger than the previous one, with greedy matching helping us rachet up several steps at a time. The /g might be doing something else tricky, but I don't see it.

    Update: Other word pairs that fail similarly are sense/tense and onion/union.


Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: snippet [id://145608]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others taking refuge in the Monastery: (7)
As of 2020-11-24 10:18 GMT
Find Nodes?
    Voting Booth?

    No recent polls found