Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

Re: OT: Vector based search engine

by Ryszard (Priest)
on May 18, 2004 at 06:31 UTC ( #354187=note: print w/ replies, xml ) Need Help??


in reply to OT: Vector based search engine

After looking at various algorithms and what not, I've settled on a (sigh) ready made solution, swish-e.

Its pretty cool, and while written in c, has a perl API which is quite nice. Its extremely fast for filesystem index scans and terribly slow for web spidering (not surprising really), (props tachyon).

My site is a dynamic photo album site, not static html pages. As the content I want to index is stored on the FS, the scan takes less than a second (as opposed to 33hrs spidering) with swish-e.

I've learned (not heaps) but a little more about search engine's, and know a bit more about the different algorithms and the advantages/disadvantages of them.

Again, unfort, the Vector Space alorithm referenced above did not meet my needs, with completely irrelevant results, YMMV. I've not delved into the nitty gritty of how/why, but its on my list of things to do.


Comment on Re: OT: Vector based search engine

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://354187]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others exploiting the Monastery: (16)
As of 2015-07-28 21:27 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (259 votes), past polls