Just another Perl shrine PerlMonks

### Matching bits of 2 strings

by m2 (Sexton)
 on Jan 22, 2002 at 04:09 UTC Need Help??
m2 has asked for the wisdom of the Perl Monks concerning the following question:

Hiyas :) Here's the situation: I have many pairs of 2 strings that may (or may not) match each other from the beginning up to a certain point, then become different. What's the "best" (efficient, clean, etc) way of determining 1). the matching parts of the two strings 2). the part of each that's different?
i.e. for:
$string1 = "the date is today";$string2 = "the date is tomorrow";
[download]
I want:
$matchingpart = "the date is to"$diff1 = "day"
$diff2 = "morrow" [download] It isn't hard to iterate over the first string letter by letter and do comparison to the second, but it really seems like there should be a more... elegant solution. Any ideas? Regards, m2. Replies are listed 'Best First'. Re: Matching bits of 2 strings by japhy (Canon) on Jan 22, 2002 at 04:50 UTC This is the "leading common substring" problem. There's been work in Perl done on it. Here's a relatively fast solution: # leading common substring # returns length of LCS # requires Perl 5.6+ sub lcs { ($_[0] ^ $_[1]) =~ /\0*/; return$+[0];
}
[download]
If you're not fortunate enough to have Perl 5.6 yet, here's a compatible approach:
# leading common substring
# returns length of LCS
# requires Perl 5.6+
sub lcs {
($_[0] ^$_[1]) =~ /(\0*)/;
return length $1; } [download] Update: in case you were worried, I do not need a ^ anchor in these regexes, because * will gladly match zero times at the beginning of the string if it has to. _____________________________________________________ Jeff[japhy]Pinyan: Perl, regex, and perl hacker. s++=END;++y(;-P)}y js++=;shajsj<++y(p-q)}?print:??; Re: Matching bits of 2 strings by Fletch (Chancellor) on Jan 22, 2002 at 07:31 UTC I had this problem come up a while back (in fact I think it was my question that prompted the thread on fwp about it . . .). At any rate, this was something I had lying around from it that benchmarks three solutions. You can decide for yourself if the first is cheating or if you really, really care about speed. :) Also, note that these are returning the length of the common string not the string itself. #!/usr/bin/perl use Benchmark qw( cmpthese ); use Inline C => <<EOF; int comlen(char *p, char *q) { int i = 0; while( *p && (*p++ == *q++) ) i++; return i; } EOF sub comlen_or { length((($_[0]^$_[1])=~m/^(\0+)/)[0]); } sub comlen_tr { my($t );

return ($t=$_[0]^$_[1])=~ y/\0/\0/; }$a = "abcdefghijk";
$b = "abcdefg"; cmpthese( shift || 2_000_000, { inline_c => sub { comlen($a, $b ) }, comlen_or => sub { comlen_or($a, $b ) }, comlen_tr => sub { comlen_tr($a, $b ) }, }, ); exit 0; [download] You've beaten me to the use of C. I remembered that it would be faster to call an XS (or Inline::C) function here. However, I'd like to point out that using tr/// for this problem is not a help at all. Using tr/\0// merely tells you how many characters are the same between the two words in total, not how many characters are the same at the start of the string. This is the fatal flaw. I believe this flaw existed in the LCS presentation at a YAPC in years gone by. Example: calling the function with "brought" and "wrought" should return 0, except that the tr/// solution returns 6. _____________________________________________________ Jeff[japhy]Pinyan: Perl, regex, and perl hacker. s++=END;++y(;-P)}y js++=;shajsj<++y(p-q)}?print:??; Re: Matching bits of 2 strings by Zaxo (Archbishop) on Jan 22, 2002 at 07:04 UTC If you sort a list of the strings into an array, the deepest matches will be neighbors. That allows you to terminate the search after relatively few applications of japhy's excellent advice. After Compline, Zaxo Re: Matching bits of 2 strings by Anonymous Monk on Jan 22, 2002 at 13:49 UTC My solution to the problem isn't as compact as the other solutions, but TIMTOWTDI! #! /usr/local/bin/perl$string1 = "the date is today";
$string2 = "the date is tomorrow";$concat = $string1 . "#" .$string2;

if ( $concat =~ m/^(.*)(.*)#\1(.*)$/ ) {
print "matching part: $1\n"; print "difference:$2\ndifference: $3\n"; } else { print "strings$string1 and $string2 do not match!\n"; } [download] When run it generates the following output matching part: the date is to difference: day difference: morrow [download] I have to note that this solutions uses the fact dat the to strings doesn't contain any poundsymbols (#). When one of the to strings contain a poundsymbol, you have to use another symbol(sequence) to, for example: ...$concat = $string1 . "#!!#" .$string2;

if ( $concat =~ m/^(.*)(.*)#!!#\1(.*)$/ ) {
...
[download]
I hope that this will give you a hint to construct your own solution!

Create A New User
Node Status?
node history
Node Type: perlquestion [id://140538]
Approved by root
help
Chatterbox?
 [LanX]: I want the command line in the history [tye]: -Mouse [Corion]: Option a) would mean launching cmd.exe /k c:\path\to\ batchfile- launching-perl- script.cmd. Option b) would be to add pause as the last line of said batch file. [LanX]: First day after holidays ... and already stressed by the fact that colleagues changed stuff without communication ... apparently I'm the only one trying to fight entropy [Corion]: LanX: The command is always in the history if you typed it in before. If you didn't type the command into the command line, it will not be there. I think there is doskey which can stuff command lines into the history LanX damns the cult of CB ;-) LanX WTF WTF WTF [LanX]: please forget my last 3 posts [LanX]: Yeah option a doesn't go into history [LanX]: probably I need to teach the app to restart after C-c Kill

How do I use this? | Other CB clients
Other Users?
Others chilling in the Monastery: (9)
As of 2017-03-27 15:48 GMT
Sections?
Information?
Find Nodes?
Leftovers?
Voting Booth?
Should Pluto Get Its Planethood Back?

Results (320 votes). Check out past polls.