Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight

Re: Comparing Approximate Items

by dree (Monsignor)
on Jan 08, 2003 at 23:11 UTC ( #225388=note: print w/ replies, xml ) Need Help??

in reply to Comparing Approximate Items

You could use Text::Levenshtein

It is an edit distance, i.e. it is a measure of the degree of proximity between two strings.
So for example, distance("foo","four") is 2 because you need an edit "SUBSTITUTE" and an edit "INSERT".

As algorithm I suggest the 'Stable Marriage Problem', a matching algorithm to best fit the "marriage preferences" of two sets.

Comment on Re: Comparing Approximate Items
Replies are listed 'Best First'.
Re: Re: Comparing Approximate Items
by tall_man (Parson) on Jan 09, 2003 at 01:31 UTC
    I think you are right. Text::Levenshtein is better in this case because String::Approx will match substrings of the input. Here is one more thing. If substitutions are not allowed, only inserts and deletes, you could use Text::WagnerFischer to set the cost of substitution so high that it will not be used.
    use Text::Levenshtein; use Text::WagnerFischer; my $pat = 'AAB'; my @lst = qw(ABAB ABBA ABB ABABAAB); my @dist1 = Text::Levenshtein::distance($pat, @lst); my @dist2 = Text::WagnerFischer::distance([0, 1, 100], $pat, @lst); my ($i, $item); $i = 0; foreach $item (@lst) { print "Levenshtein distance of $item to $pat is ",$dist1[$i],"\n"; print "WagnerFischer distance of $item to $pat is ",$dist2[$i],"\n" +; $i++; }

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://225388]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (5)
As of 2015-11-28 00:45 GMT
Find Nodes?
    Voting Booth?

    What would be the most significant thing to happen if a rope (or wire) tied the Earth and the Moon together?

    Results (735 votes), past polls