Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Re: Comparing Approximate Items

by dree (Monsignor)
on Jan 08, 2003 at 23:11 UTC ( #225388=note: print w/ replies, xml ) Need Help??


in reply to Comparing Approximate Items

You could use Text::Levenshtein

It is an edit distance, i.e. it is a measure of the degree of proximity between two strings.
So for example, distance("foo","four") is 2 because you need an edit "SUBSTITUTE" and an edit "INSERT".

As algorithm I suggest the 'Stable Marriage Problem', a matching algorithm to best fit the "marriage preferences" of two sets.


Comment on Re: Comparing Approximate Items
Re: Re: Comparing Approximate Items
by tall_man (Parson) on Jan 09, 2003 at 01:31 UTC
    I think you are right. Text::Levenshtein is better in this case because String::Approx will match substrings of the input. Here is one more thing. If substitutions are not allowed, only inserts and deletes, you could use Text::WagnerFischer to set the cost of substitution so high that it will not be used.
    use Text::Levenshtein; use Text::WagnerFischer; my $pat = 'AAB'; my @lst = qw(ABAB ABBA ABB ABABAAB); my @dist1 = Text::Levenshtein::distance($pat, @lst); my @dist2 = Text::WagnerFischer::distance([0, 1, 100], $pat, @lst); my ($i, $item); $i = 0; foreach $item (@lst) { print "Levenshtein distance of $item to $pat is ",$dist1[$i],"\n"; print "WagnerFischer distance of $item to $pat is ",$dist2[$i],"\n" +; $i++; }

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://225388]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chilling in the Monastery: (13)
As of 2015-07-06 13:14 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (74 votes), past polls