|
|
| No such thing as a small change | |
| PerlMonks |
Re: Fast string similarity methodby Random_Walk (Parson) |
| on May 29, 2007 at 19:09 UTC ( #618033=note: print w/ replies, xml ) | Need Help?? |
|
Each string is compared to every other string and you are comparing words. As well as removing stopwords as already suggested you may get a speed up if you replace the actual words with tokens and then compare tokenised version of the strings. Of course this breaks the similarity between dog and dogs (removing all word terminal s's before tokenising may be an OK fix) and somewhat worse it breaks similarity between run and ran and no doubt many other gramatical form shifts. If it helps is down to your data really. Cheers,
Pereant, qui ante nos nostra dixerunt!
In Section
Seekers of Perl Wisdom
|
|
||||||||||||||||||||