|Pathologically Eclectic Rubbish Lister|
Fuzzy matching to user's tastes?by Seumas (Curate)
|on Jun 27, 2003 at 15:21 UTC||Need Help??|
Seumas has asked for the
wisdom of the Perl Monks concerning the following question:
I'm trying to add a feature to my site that let's users see a list of items the system believes they may like based on their prior selected items. I have no mathematical background so I'm sure there are far superior ways to accomplish this with great accuracy than what I'm planning on trying. If you have a site or node that I should familiarize myself with, please guide me to it. I haven't found much to help me with ideas or algorithms for this so far.
My plan, however, is to do something like this:
For example, if we have an item called "really ugly petite blue pants", we would compare each word against the user's records (that we've pulled out and stuffed into a hash). We find that petite and pants match in the top of the user's records (petite and pants are some of that most frequently encountered words in the titles of items the user has selected in the past). Then we check to see if the category that the item is in matches any categories the user has previously selected items from. If it does, we add to its score. We do this for each item and then the top N items on the entire site are displayed on a page for the user to look at.
I think this would work. I'm not sure how well, but... it would work. My biggest fear is that this is a hell of a lot of processing to do! Especially if you're talking about users who may have hundreds, thousands or even tens of thousands of items in their history and a site with easily 5,000 items to match against. Imagine having to do the above steps 5,000 times and storing it all in a hash temporarily while you pick through what the top items are and build up the scores!
So I'm looking for improvements, alternatives... most any suggestions whatsoever.
Added <readmore> at author's request - dvergin 2003-06-27