OK, so it appears that by “maximum partial match” you mean longest common substring. A search on that phrase found the thread finding longest common substring, from which I derived the following:
Update 1 (1st January, 2013):
Algorithm::Diff is actually the wrong module for this, I should have used String::LCSS. The former finds non-contiguous sub-sequences; the latter finds substrings. Revised code:
#! perl
use strict;
use warnings;
use String::LCSS;
use constant {
CASE_SENSITIVE => 0,
DICTIONARY_FILE => 'words.txt',
};
print 'Enter the target word: ';
chomp(my $orig_target = <STDIN>);
my $target = CASE_SENSITIVE ? $orig_target : lc $orig_target;
open(my $in, '<', DICTIONARY_FILE)
or die "Cannot open file '" . DICTIONARY_FILE . "' for reading: $!
+";
my %substrings;
while (my $orig_word = <$in>)
{
chomp $orig_word;
my $word = CASE_SENSITIVE ? $orig_word : lc $orig_word;
my @lcss = lcss($word, $target);
$substrings{ $lcss[0] } = [ $orig_word, $lcss[1], $lcss[2] ] if $l
+css[0];
}
close $in
or die "Cannot close file '" . DICTIONARY_FILE . "': $!";
print 'Target: ', $orig_target, "\n";
if (%substrings)
{
my $key = (sort { length $a <=> length $b } keys %substrings)[-
+1];
my $match = $substrings{ $key }->[0];
my $index2 = $substrings{ $key }->[2];
my $substr = substr($orig_target, $index2, length $key);
print 'Closest match: ', $match, "\n";
print 'Longest common substring: ', $substr, "\n";
}
else
{
print "No matches found\n";
}
sub lcss
{
my ($first, $second) = @_;
$first .= '$'; # force strings to be different:
$second .= '@'; # kludge required by String::LCSS::lcss
my @results = String::LCSS::lcss($first, $second);
return wantarray ? @results : $results[0];
}
Update 2 (1st January, 2013):
It appears that String::LCSS is more badly broken than I realised. Even simple matches can fail to find the longest common substring:
18:27 >perl -MString::LCSS=lcss -wE "say scalar lcss('abxabcy', 'abc')
+;"
ab
18:28 >
(And see http://cpanratings.perl.org/dist/String-LCSS.)
Better to replace sub lcss in the above script with the following by BrowserUk in Re: finding longest common substring:
sub lcss
{
my $strings = join "\0", @_;
my $lcs;
for my $n (1 .. length $strings)
{
my $re = "(.{$n})" . '.*\0.*\1' x (@_ - 1);
last unless $strings =~ $re;
$lcs = $1;
}
return $lcs;
}
Update 3 (2nd January, 2013):
Discovered the thread Does String::LCSS work?. String::LCSS is indeed broken, but String::LCSS_XS seems to work correctly.
Hope that helps,
|