I've been working on a slightly similar method and it produces pretty decent results. My only problem is when do I do the comparisons? If I do it when I want a similarity I have to compare against every single set in my database every time, which seems a little suboptimal. But if I compute it when I add a new item then only the newly added items will be similar to the older items, the old items won't be similar to the new items.
in reply to Re^3: Comparing sets of phrases stored in a database?
in thread Comparing sets of phrases stored in a database?
My dataset isn't giant though, I'll probably have somewhere between 5k-15k sets and adding 100-200 a day. Maybe I'm over optimizing.