|Welcome to the Monastery|
Weighted results on a perlmonks searchby Dog and Pony (Priest)
|on May 08, 2002 at 20:29 UTC||Need Help??|
Lately, with Super search being not what it was, I've more and more resorted to searching via the "basic search" - the box at the top of the page. I've found that searching that way often brings me the nodes I'd like to see, since *often* the good nodes have the right keywords in the title. However, it presents the result in an unsorted fashion, which can make the node you want appear very far down the list. So I started thinking, why not weigh the results for relevance? One of the hopes would also be less "asked and answered" for those that actually try to search first, and gives up on the long list.
Update: The upgrades by ar0n (see below) has b0rked my code at the moment. :) No use trying it, since it will report "no results".
So, of course I whipped up a test.
Also, someone said that if you present some code, the change is more likely to happen - not that I am so sure this code can be put to any use, but at least it demonstrates what i mean. Since I don't have access to the PM code, this CGI (which is really just an example) does a search here on perlmonks, and parses the result, then weighs the resulting titles after how many matched words it contained. Also, it weighs down replies, which are probably less interesting (you can see the whole list from the root node, and you don't know which one will be the one you want). Maybe they should even be sorted out altogether.
Queries I tried with lots of improvement include: "difference between two dates", "exec ssi" and "sort an array". Feel free to test more "common" queries, on PM and with my script. Since PM returns results in different order each time, results with the same "weight" will also be reordered upon retries.
If anyone feel like implementing this, it is the last part of the code that is of interest, the weighing code itself. I'd also probably sort on reputation if I had that easily accessible - but that would be more important for replies. The algorithm is probably not that good, but it is a start. :) I suppose there are other stuff to go on if you have access to the result of a direct SQL query too.
Anyhow, here is the code. Remember, it is something hacked together, be nice *grin*. Nah, bring it on if you have improvements:
Yes, I know. One big ugly sub is bad. I just didn't feel like fixing it at the moment. The same goes for creating links like that in the middle, it is just for the display, but should have no impact on the sorting. :)
Other suggestions might include doing such a search and present the five(?) best search results when someone previews a post to SOPW. Something like: "Something similar has already been posted, maybe some of these will answer this question?" with an option to view the whole list as well? Or would the impact be too high?
While I am at it, the HTML on the search page should really have <ul> ... </ul> around the search result, and should close the <li> tags. If I may be picky. :)
Well, anyhow, maybe this could be put to some use? Otherwise, it is also available (such as it is) at http://dogandpony.perlmonk.org/cgi-bin/search_pm.pl.
You have moved into a dark place.
It is pitch black. You are likely to be eaten by a grue.