Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options

Re^11: partial match between 2 files

by Athanasius (Chancellor)
on Dec 27, 2012 at 16:36 UTC ( #1010548=note: print w/replies, xml ) Need Help??

in reply to Re^10: partial match between 2 files
in thread partial match between 2 files

The line my$a=$_; does nothing useful, since $_ is uninitialised. As Anonymous Monk says, you need to read from STDIN. For example:

#! perl use strict; use warnings; print "Enter the target word: "; chomp(my $target = <STDIN>); my $in_file = 'words.txt'; open(my $in, '<', $in_file) or die "Cannot open file '$in_file' for re +ading: $!"; my @matches; while (<$in>) { chomp; push @matches, $_ if $target =~ /$_/i; } close $in or die "Cannot close file '$in_file': $!"; @matches = sort { length $a <=> length $b } @matches; print "The closest match is: ", $matches[-1], "\n";

If the file “words.txt” contains:

fal falle fall

then, when “fallen” is entered from the keyboard, the output of the above script is:

2:31 >perl Enter the target word: fallen The closest match is: falle 2:31 >

Hope that helps,

Update: Fixed error in sort: changed > to <=>. Also changed the order of words in the input file.

Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

Replies are listed 'Best First'.
Re^12: partial match between 2 files
by lakssreedhar (Acolyte) on Dec 28, 2012 at 06:24 UTC

    The code produces an error "Use of uninitialized value in print" when a word awiSayanA is compared with 2 words
    present in dictionary.Output is not produced.

      OK, so it appears that by “maximum partial match” you mean longest common substring. A search on that phrase found the thread finding longest common substring, from which I derived the following:

      Update 1 (1st January, 2013):

      Algorithm::Diff is actually the wrong module for this, I should have used String::LCSS. The former finds non-contiguous sub-sequences; the latter finds substrings. Revised code:

      #! perl use strict; use warnings; use String::LCSS; use constant { CASE_SENSITIVE => 0, DICTIONARY_FILE => 'words.txt', }; print 'Enter the target word: '; chomp(my $orig_target = <STDIN>); my $target = CASE_SENSITIVE ? $orig_target : lc $orig_target; open(my $in, '<', DICTIONARY_FILE) or die "Cannot open file '" . DICTIONARY_FILE . "' for reading: $! +"; my %substrings; while (my $orig_word = <$in>) { chomp $orig_word; my $word = CASE_SENSITIVE ? $orig_word : lc $orig_word; my @lcss = lcss($word, $target); $substrings{ $lcss[0] } = [ $orig_word, $lcss[1], $lcss[2] ] if $l +css[0]; } close $in or die "Cannot close file '" . DICTIONARY_FILE . "': $!"; print 'Target: ', $orig_target, "\n"; if (%substrings) { my $key = (sort { length $a <=> length $b } keys %substrings)[- +1]; my $match = $substrings{ $key }->[0]; my $index2 = $substrings{ $key }->[2]; my $substr = substr($orig_target, $index2, length $key); print 'Closest match: ', $match, "\n"; print 'Longest common substring: ', $substr, "\n"; } else { print "No matches found\n"; } sub lcss { my ($first, $second) = @_; $first .= '$'; # force strings to be different: $second .= '@'; # kludge required by String::LCSS::lcss my @results = String::LCSS::lcss($first, $second); return wantarray ? @results : $results[0]; }

      Update 2 (1st January, 2013):

      It appears that String::LCSS is more badly broken than I realised. Even simple matches can fail to find the longest common substring:

      18:27 >perl -MString::LCSS=lcss -wE "say scalar lcss('abxabcy', 'abc') +;" ab 18:28 >

      (And see

      Better to replace sub lcss in the above script with the following by BrowserUk in Re: finding longest common substring:

      sub lcss { my $strings = join "\0", @_; my $lcs; for my $n (1 .. length $strings) { my $re = "(.{$n})" . '.*\0.*\1' x (@_ - 1); last unless $strings =~ $re; $lcs = $1; } return $lcs; }

      Update 3 (2nd January, 2013):

      Discovered the thread Does String::LCSS work?. String::LCSS is indeed broken, but String::LCSS_XS seems to work correctly.

      Hope that helps,

      Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

        A problem exists like this gets only longest common sub string.suppose my dictionary has words crick, cricketer and testing input is cricking,the match should be to crick but the program gives cricketer as closest matching string

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1010548]
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others musing on the Monastery: (7)
As of 2017-08-22 20:56 GMT
Find Nodes?
    Voting Booth?
    Who is your favorite scientist and why?

    Results (340 votes). Check out past polls.