Re^10: partial match between 2 files

in reply to Re^9: partial match between 2 files
in thread partial match between 2 files

I need to print the word in the file which has the maximum partial match to the inputted string.ie if the inputted string is fallen and the file has words fal,fall,falle,the value to be printed is falle.The code below gives errors

#!/usr/bin/perl-w 
use warnings;
use strict;

open my $fh1, '<', 'words'  or die $!;
my $a=$_;
while (my  $f1 = <$fh1> ) {
    chomp $f1;
    my $f2 = $a;
        chomp $f2;
if($f2=~m/$f1/i){print"$f1\n";}
}
[download]

Comment on Re^10: partial match between 2 files Download Code

Replies are listed 'Best First'.

Re^11: partial match between 2 files
by Anonymous Monk on Dec 27, 2012 at 10:21 UTC

The code below gives errors

Maybe you should figure out what they mean :/ use diagnostics

[reply]

Re^11: partial match between 2 files
by Athanasius (Archbishop) on Dec 27, 2012 at 16:36 UTC

The line my$a=$_; does nothing useful, since $_ is uninitialised. As Anonymous Monk says, you need to read from STDIN. For example:

#! perl
use strict;
use warnings;

print "Enter the target word: ";
chomp(my $target = <STDIN>);

my $in_file = 'words.txt';
open(my $in, '<', $in_file) or die "Cannot open file '$in_file' for re
+ading: $!";
my @matches;

while (<$in>)
{
    chomp;
    push @matches, $_ if $target =~ /$_/i;
}

close $in                   or die "Cannot close file '$in_file': $!";

@matches = sort { length $a <=> length $b } @matches;
print "The closest match is:  ", $matches[-1], "\n";
[download]

If the file “words.txt” contains:

fal
falle
fall
[download]

then, when “fallen” is entered from the keyboard, the output of the above script is:

 2:31 >perl 454_SoPW.pl
Enter the target word: fallen
The closest match is:  falle

 2:31 >
[download]

Hope that helps,

Update: Fixed error in sort: changed > to <=>. Also changed the order of words in the input file.

Athanasius <°(((>< contra mundum Iustus alius egestas vitae, eros Piratica,

[reply]
[d/l]
[select]

Re^12: partial match between 2 files

by lakssreedhar (Acolyte) on Dec 28, 2012 at 06:24 UTC

The code produces an error "Use of uninitialized value in print" when a word awiSayanA is compared with 2 words
awiSayaM
awiSayanZ
present in dictionary.Output is not produced.

[reply]

Re^13: partial match between 2 files

by Athanasius (Archbishop) on Dec 28, 2012 at 09:53 UTC

OK, so it appears that by “maximum partial match” you mean longest common substring. A search on that phrase found the thread finding longest common substring, from which I derived the following:

Read more... (2 kB)

Update 1 (1^st January, 2013):

Algorithm::Diff is actually the wrong module for this, I should have used String::LCSS. The former finds non-contiguous sub-sequences; the latter finds substrings. Revised code:

#! perl
use strict;
use warnings;
use String::LCSS;

use constant {
                 CASE_SENSITIVE  =>  0,
                 DICTIONARY_FILE => 'words.txt',
             };

print 'Enter the target word:    ';
chomp(my $orig_target = <STDIN>);
my $target = CASE_SENSITIVE ? $orig_target : lc $orig_target;

open(my $in, '<', DICTIONARY_FILE)
    or die "Cannot open file '" . DICTIONARY_FILE . "' for reading: $!
+";
my %substrings;

while (my $orig_word = <$in>)
{
    chomp $orig_word;
    my $word = CASE_SENSITIVE ? $orig_word : lc $orig_word;
    my @lcss = lcss($word, $target);
    $substrings{ $lcss[0] } = [ $orig_word, $lcss[1], $lcss[2] ] if $l
+css[0];
}

close $in
    or die "Cannot close file '" . DICTIONARY_FILE . "': $!";

print 'Target:                   ', $orig_target, "\n";

if (%substrings)
{
    my $key    = (sort { length $a <=> length $b } keys %substrings)[-
+1];
    my $match  =  $substrings{ $key }->[0];
    my $index2 =  $substrings{ $key }->[2];
    my $substr =  substr($orig_target, $index2, length $key);

    print 'Closest match:            ', $match,  "\n";
    print 'Longest common substring: ', $substr, "\n";
}
else
{
    print "No matches found\n";
}

sub lcss
{
    my ($first, $second) = @_;

    $first  .= '$';     # force strings to be different:
    $second .= '@';     #   kludge required by String::LCSS::lcss

    my @results = String::LCSS::lcss($first, $second);

    return wantarray ? @results : $results[0];
}
[download]

Update 2 (1^st January, 2013):

It appears that String::LCSS is more badly broken than I realised. Even simple matches can fail to find the longest common substring:

18:27 >perl -MString::LCSS=lcss -wE "say scalar lcss('abxabcy', 'abc')
+;"
ab

18:28 >
[download]

(And see http://cpanratings.perl.org/dist/String-LCSS.)

Better to replace sub lcss in the above script with the following by BrowserUk in Re: finding longest common substring:

sub lcss
{
    my $strings = join "\0", @_;
    my $lcs;

    for my $n (1 .. length $strings)
    {
        my $re = "(.{$n})" . '.*\0.*\1' x (@_ - 1);
        last unless $strings =~ $re;
        $lcs = $1;
    }

    return $lcs;
}
[download]

Update 3 (2^nd January, 2013):

Discovered the thread Does String::LCSS work?. String::LCSS is indeed broken, but String::LCSS_XS seems to work correctly.

Hope that helps,

Athanasius <°(((>< contra mundum Iustus alius egestas vitae, eros Piratica,

[reply]
[d/l]
[select]

Re^14: partial match between 2 files

by lakssreedhar (Acolyte) on Jan 08, 2013 at 09:14 UTC

Re^15: partial match between 2 files

by Athanasius (Archbishop) on Jan 10, 2013 at 16:30 UTC

Some notes below your chosen depth have not been shown here

In Section Seekers of Perl Wisdom