Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer

Re: Fast string similarity method

by Random_Walk (Prior)
on May 29, 2007 at 19:09 UTC ( #618033=note: print w/replies, xml ) Need Help??

in reply to Fast string similarity method

Each string is compared to every other string and you are comparing words. As well as removing stopwords as already suggested you may get a speed up if you replace the actual words with tokens and then compare tokenised version of the strings.

Of course this breaks the similarity between dog and dogs (removing all word terminal s's before tokenising may be an OK fix) and somewhat worse it breaks similarity between run and ran and no doubt many other gramatical form shifts. If it helps is down to your data really.


Pereant, qui ante nos nostra dixerunt!

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://618033]
[Corion]: marto: Live video or are they recording and storing inspection videos? If it's for archival, who is rewatching that stuff? ;)
[Discipulus]: imperial infiltration
[marto]: Corion none of it is live, all recordings, in the first instance they need to transfer video from dead/dying media (DVDR, VHS, Laserdisc and so on)
[marto]: Discipulus, yeah pretty much robots going around parts of the plant humans can't

How do I use this? | Other CB clients
Other Users?
Others browsing the Monastery: (7)
As of 2017-07-28 09:39 GMT
Find Nodes?
    Voting Booth?
    I came, I saw, I ...

    Results (425 votes). Check out past polls.