Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

Compare String within an Array

by tosaiju (Acolyte)
on Aug 01, 2012 at 05:30 UTC ( [id://984698]=perlquestion: print w/replies, xml ) Need Help??

tosaiju has asked for the wisdom of the Perl Monks concerning the following question:

Dear Perl Monks,

I’m searching for a string comparison function - where I have to find the distance/score between two strings or a string with an array. Appreciate any thoughts/help.

I have a string (say a person’s name) which to be compared with a list of values (names) – if any of them matches – to show the most matching one.

I have heard about an algorithm “Damerau–Levenshtein distance”; also saw a CPAN module related to it “Text::Levenshtein”. Just wondering if this would be best option or are there any other best modules for string comparison.

Thanks a lot for all advice and help Dear Monks.

Replies are listed 'Best First'.
Re: Compare String within an Array
by davido (Cardinal) on Aug 01, 2012 at 05:59 UTC

    It seems like Text::Levenshtein has been canonical in this field with repect to Perl. However, it may not be as well maintained as it could be. Jarkko Hietaniemi (co-author of Mastering Algorithms with Perl), in the POD for String::Approx (which is not appropriate for string comparisons) also mentions Text::WagnerFischer and Text::PhraseDistance. I don't find the latter still on CPAN. The Wagner-Fisher algorithm does look interesting but what I learn from Wikipedia leads me to believe it's just an implementation of measuring the Levenshtein distance. I have no idea whether it would be more or less suitable for your purposes than Text::Levenshtein, but might be worth a look to see if it fits your specific need. Sometimes where results are similar it just comes down to what fits better into your code design.


    Dave

      Thanks much Dave, thats a wonderful piece of information.

        Your question got me digging again into my copy of Mastering Algorithms with Perl. I've heard it said that the book is getting dated, and that it is a little out of sync with the "best practices" of 2012. But algorithms themselves never really go out of style, and I still find it to be an excellent resource as long as you keep in mind that the algorithms are sound, but their implementations might deserve some Perlish modernization. I first read it around 2003-2004, but seldom a month passes that I don't refer back to it for one thing or another.

        Chapter nine of the book has a section called "String-Matching Algorithms". If you're in a Perl shop, someone in the office surely has a copy of this book. And if not, find a used copy on Amazon or something. It's worth it. There are a number of string matching algorithms discussed in the chapter. Some of them are used for "exact" matches, and some for "approximate".

        The next section in the same chapter discusses phonetic algorithms such as what is implemented by Text::Soundex. It also mentions Schwern's (at the time experimental) Text::Metaphone. I see that more recent releases of that module are no longer considered experimental.

        The book gives a lot to chew on.


        Dave

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://984698]
Approved by Ratazong
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chilling in the Monastery: (2)
As of 2024-04-26 00:49 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found