Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

Comparing Strings

by Segfault (Scribe)
on May 08, 2001 at 03:18 UTC ( #78714=perlquestion: print w/ replies, xml ) Need Help??
Segfault has asked for the wisdom of the Perl Monks concerning the following question:

This is probably a pretty newbie-ish question, but I've tried everything I can think of so far for this problem, and I've decided to beg for the help of the great monks. ;)

Basically what I'm doing is, I want to have a function that will take two strings and compare them, returning a percentage of differences between string A and string B. This is pretty easy if you assume they will be the same length and simply want to compare character by character, but I'll be working with strings of varying lengths.

For example, if string A is "this is a really annoying piece of text" and string B is "a really annoying piece of text" a character-by-character comparison would work very poorly for indicating how similar the strings are for the most part.

Anyway, I was wondering what might be good approaches for doing this sort of comparison, so that I can get fairly accurate ideas of how one string relates to another in this project I'm working on.

Thanks in advance for any help

Comment on Comparing Strings
Re: Comparing Strings
by runrig (Abbot) on May 08, 2001 at 03:25 UTC
Re: Comparing Strings
by no_slogan (Deacon) on May 08, 2001 at 03:26 UTC
    The String::Approx package can calculate the "edit distance" (number of edits to change one string to another).
    use String::Approx 'adist'; $dist = adist("pattern", $input);
Re: Comparing Strings
by ezekiel (Pilgrim) on May 08, 2001 at 03:48 UTC

    Your problem is very similar to protein sequence homology searches. A protein can be represented as nothing more than a string from an alphabet of 20 letters. Sequence homology searches (which are crucial to biology and bioinformatics) basically attempt to find and score similarities between two or more such sequences.

    Various algorithms exist to do this e.g. Needleman and Wunsch (Journal of Molecular Biology 48 pp443) and the ever popular BLAST www.ncbi.nlm.nih.gov/blast Of course these solutions are designed for molecular biology and would require a lot of work to alter them to handle general strings. My guess is you are looking for a simpler solution...

Re: Comparing Strings
by Segfault (Scribe) on May 12, 2001 at 21:11 UTC
    Thanks very much for the help, I owe you guys. ;)

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://78714]
Approved by root
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others about the Monastery: (6)
As of 2014-09-21 14:14 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    How do you remember the number of days in each month?











    Results (172 votes), past polls