Welcome to the Monastery | |
PerlMonks |
Re: similar texts !?by Albannach (Monsignor) |
on Jul 12, 2003 at 13:35 UTC ( [id://273624]=note: print w/replies, xml ) | Need Help?? |
Here's another vote for Text::Levenshtein which I have found very handy for comparing strings (mostly detecting data entry errors), especially those with mixed letters and numbers, though I too wish I could get the XS version working.
I'd also like to point out Text::Metaphone as a soundex on steroids, as I've found soundex to be too insensitive at times. Note however that all but letters are ignored by Metaphone, which may limit its usefulness to you. I think BrowserUk points out a serious problem in the case of MP3 files, but as most cases I've seen use some sort of fairly standard separators between "fields" in the filename, you could split each name into fields, then do the comparisons between two MP3 names on all possible pairings, selecting the best match as the most likely set of pairings. This will of course be much slower than comparing the entire name, but there are probably only 3 or 4 fields per name so you shouldn't be looking at run times greater than the lifetime of the universe either. --
In Section
Seekers of Perl Wisdom
|
|