Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

Re^10: partial match between 2 files

by lakssreedhar (Acolyte)
on Dec 27, 2012 at 08:42 UTC ( #1010469=note: print w/ replies, xml ) Need Help??


in reply to Re^9: partial match between 2 files
in thread partial match between 2 files

I need to print the word in the file which has the maximum partial match to the inputted string.ie if the inputted string is fallen and the file has words fal,fall,falle,the value to be printed is falle.The code below gives errors

#!/usr/bin/perl-w use warnings; use strict; open my $fh1, '<', 'words' or die $!; my $a=$_; while (my $f1 = <$fh1> ) { chomp $f1; my $f2 = $a; chomp $f2; if($f2=~m/$f1/i){print"$f1\n";} }


Comment on Re^10: partial match between 2 files
Download Code
Re^11: partial match between 2 files
by Anonymous Monk on Dec 27, 2012 at 10:21 UTC

    The code below gives errors

    Maybe you should figure out what they mean :/ use diagnostics

Re^11: partial match between 2 files
by Athanasius (Monsignor) on Dec 27, 2012 at 16:36 UTC

    The line my$a=$_; does nothing useful, since $_ is uninitialised. As Anonymous Monk says, you need to read from STDIN. For example:

    #! perl use strict; use warnings; print "Enter the target word: "; chomp(my $target = <STDIN>); my $in_file = 'words.txt'; open(my $in, '<', $in_file) or die "Cannot open file '$in_file' for re +ading: $!"; my @matches; while (<$in>) { chomp; push @matches, $_ if $target =~ /$_/i; } close $in or die "Cannot close file '$in_file': $!"; @matches = sort { length $a <=> length $b } @matches; print "The closest match is: ", $matches[-1], "\n";

    If the file “words.txt” contains:

    fal falle fall

    then, when “fallen” is entered from the keyboard, the output of the above script is:

    2:31 >perl 454_SoPW.pl Enter the target word: fallen The closest match is: falle 2:31 >

    Hope that helps,

    Update: Fixed error in sort: changed > to <=>. Also changed the order of words in the input file.

    Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

      The code produces an error "Use of uninitialized value in print" when a word awiSayanA is compared with 2 words
      awiSayaM
      awiSayanZ
      present in dictionary.Output is not produced.

        OK, so it appears that by “maximum partial match” you mean longest common substring. A search on that phrase found the thread finding longest common substring, from which I derived the following:

        Update 1 (1st January, 2013):

        Algorithm::Diff is actually the wrong module for this, I should have used String::LCSS. The former finds non-contiguous sub-sequences; the latter finds substrings. Revised code:

        #! perl use strict; use warnings; use String::LCSS; use constant { CASE_SENSITIVE => 0, DICTIONARY_FILE => 'words.txt', }; print 'Enter the target word: '; chomp(my $orig_target = <STDIN>); my $target = CASE_SENSITIVE ? $orig_target : lc $orig_target; open(my $in, '<', DICTIONARY_FILE) or die "Cannot open file '" . DICTIONARY_FILE . "' for reading: $! +"; my %substrings; while (my $orig_word = <$in>) { chomp $orig_word; my $word = CASE_SENSITIVE ? $orig_word : lc $orig_word; my @lcss = lcss($word, $target); $substrings{ $lcss[0] } = [ $orig_word, $lcss[1], $lcss[2] ] if $l +css[0]; } close $in or die "Cannot close file '" . DICTIONARY_FILE . "': $!"; print 'Target: ', $orig_target, "\n"; if (%substrings) { my $key = (sort { length $a <=> length $b } keys %substrings)[- +1]; my $match = $substrings{ $key }->[0]; my $index2 = $substrings{ $key }->[2]; my $substr = substr($orig_target, $index2, length $key); print 'Closest match: ', $match, "\n"; print 'Longest common substring: ', $substr, "\n"; } else { print "No matches found\n"; } sub lcss { my ($first, $second) = @_; $first .= '$'; # force strings to be different: $second .= '@'; # kludge required by String::LCSS::lcss my @results = String::LCSS::lcss($first, $second); return wantarray ? @results : $results[0]; }

        Update 2 (1st January, 2013):

        It appears that String::LCSS is more badly broken than I realised. Even simple matches can fail to find the longest common substring:

        18:27 >perl -MString::LCSS=lcss -wE "say scalar lcss('abxabcy', 'abc') +;" ab 18:28 >

        (And see http://cpanratings.perl.org/dist/String-LCSS.)

        Better to replace sub lcss in the above script with the following by BrowserUk in Re: finding longest common substring:

        sub lcss { my $strings = join "\0", @_; my $lcs; for my $n (1 .. length $strings) { my $re = "(.{$n})" . '.*\0.*\1' x (@_ - 1); last unless $strings =~ $re; $lcs = $1; } return $lcs; }

        Update 3 (2nd January, 2013):

        Discovered the thread Does String::LCSS work?. String::LCSS is indeed broken, but String::LCSS_XS seems to work correctly.

        Hope that helps,

        Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1010469]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others scrutinizing the Monastery: (7)
As of 2014-09-19 16:37 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    How do you remember the number of days in each month?











    Results (143 votes), past polls