Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

off-by-one string comparison

by argv (Pilgrim)
on Jun 05, 2005 at 17:32 UTC ( #463739=perlquestion: print w/ replies, xml ) Need Help??
argv has asked for the wisdom of the Perl Monks concerning the following question:

I want to see if a given string is off by 1 character from the test string. Thus, If my test string is "foo bar baz", I want to have a function that matches any one of:

fooo bar baz foo bar baaz foo bbar baz [etc]

Here, I'm looking for off-by-one errors. Extra bonus for a function that matches if it's off by an arbitrary N. That is, a function that returned the number of characters that were different between the strings rather than just a binary true/false.

Comment on off-by-one string comparison
Download Code
Re: off-by-one string comparison
by Zaxo (Archbishop) on Jun 05, 2005 at 17:42 UTC

    Text::Levenshtein implements a well-known metric for difference between strings.

    $ perl -MText::Levenshtein=distance -e'print distance( "foo bar baz"," +fooo bar baz"),$/' 1 $

    So your off-by-one test could be written,

    use Text::Levenshtein qw/distance/; sub off_by_one { 1 == distance("foo bar baz", shift); }
    For extra credit, leave out the "1 ==" part ;-)

    After Compline,
    Zaxo

Re: off-by-one string comparison
by bobf (Monsignor) on Jun 05, 2005 at 17:46 UTC

    Text::Levenshtein (which also has an XS version) will give the "edit distance" between two strings. There are several other modules that you could consider, including Text::PhraseDistance (which takes into consideration word order within the string) and String::Approx. Search CPAN - I'm sure there are others.

    HTH

Re: off-by-one string comparison
by crashtest (Curate) on Jun 05, 2005 at 19:15 UTC
    This is definetely a problem to go to CPAN for, but in the interest of wheel-reinvention, here is an implementation of my interpretation of your question. The function below matches "sticky-finger" typos, i.e. it matches strings where any number of characters were duplicated (up to a specifiable number of times). That seemed to be the gist of your question... maybe you could provide strings that shouldn't match, too?
    use strict; use warnings; my $input = 'foo bar baz'; my @test_strings = ( 'fooo bar baz', 'foo bar baaz', 'foo bbar baz', 'ffoo baar baz', # Matches: Should it? 'fxo bar baz', # No match: Different character, not dup. 'foo baaar baz' # No match: Too many 'a's! ); my $fuzziness = 1; # Let's try out the function for (@test_strings){ if (fuzzy_match($input, $_, $fuzziness)){ print "MATCHED: $_\n"; } else{ print "NO MATCH: $_\n"; } } sub fuzzy_match{ my ($input, $test_string, $fuzziness) = @_; # Build a regex from the input string my $regex = ''; for (split //, $input){ $regex .= quotemeta($_) . "\{1," . ($fuzziness + 1) . "}?"; } $test_string =~ m/$regex/; }

    As for the number of characters different, I think that would be length($test_string) - length($input_string).

Re: off-by-one string comparison
by TedPride (Priest) on Jun 05, 2005 at 22:12 UTC
    use strict; use warnings; my $str = 'foo bar baz'; while (<DATA>) { chomp; print "$_\n" if ob1($str, $_); } sub ob1 { my ($w1, $w2) = @_; return 0 if (abs(length($w1) - length($w2)) > 1); return 1 if $w1 eq $w2; if (length($w1) == length($w2)) { my ($i, $c); for ($i = 0; $i < length($w1); $i++) { $c++ if substr($w1, $i, 1) ne substr($w2, $i, 1); return 0 if $c && $c > 1; } return 1; } if (length($w1) > length($w2)) { my $t = $w1; $w1 = $w2; $w2 = $t; } $w1 = join '.?', '', split(//, $w1), ''; return 1 if $w2 =~ /$w1/; return 0; } __DATA__ fooo bar baz foo bar baaz foo bbar baz foo bar baz foo bar bz fob bar biz fooo bar bazz
    This could also be written without regex, since you only really care about one character difference. Just match the characters both ways if there is one character difference in length, and if the matched characters total is equal to the length of the smaller string, then there's only one character difference.
    adbc abc a eq a -> 1 d ne b -> c eq c <- 2 b eq b <- 3 d ne a <-
    Total matches = length of abc.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://463739]
Approved by sweetblood
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others taking refuge in the Monastery: (4)
As of 2014-11-01 04:51 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    For retirement, I am banking on:










    Results (227 votes), past polls