Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

Writing a site search engine

by SilverB1rd (Scribe)
on Aug 13, 2001 at 19:17 UTC ( #104455=perlquestion: print w/replies, xml ) Need Help??

SilverB1rd has asked for the wisdom of the Perl Monks concerning the following question:

I want to write a site search script, I have not worked with web bots before so I'm not sure where to start. I would like to get more information about what I'm about to do.

#1 What points do you need to consider when writing a site search bot?

#2 Someone suggested www::robot it looks like a nice place to start. But its still looks more advanced then my level at the moment.

#3 How do you make fuzzy searchs and rank the results?

If you can help me with any of these I would greatly appreciate it.

------
PT - Perl Tanks %100 Perl programming game
The Price of Freedom is Eternal Vigilance

Replies are listed 'Best First'.
Re: Writing a site search engine
by Hofmator (Curate) on Aug 13, 2001 at 20:09 UTC
Re: Writing a site search engine
by rucker (Scribe) on Aug 13, 2001 at 19:24 UTC
    Perhaps you should see how another search does these things. Htdig (http://www.htdig.org) is the first that comes to mind, although I can't vouch for its value as an example.
Re: Writing a site search engine
by tune (Curate) on Aug 13, 2001 at 20:05 UTC
    Glimpse/Webglimpse is a good search engine too.

    Some tips if you want to develop it for yourself:

      You have to weigh the words by their source. The following is an example preference: Words from title: 100, words from meta keywords: 75, words from headlines: 50, words from the document body: 10.

      Store the place of the word in the given document. E.g. the word is in the first 100 words. When the user is looking for two or more words, you can compare the hits, as if the found words are closer to each other as the other findings. Then you can compute a rank.

    --
    tune

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://104455]
Approved by root
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others having an uproarious good time at the Monastery: (3)
As of 2022-12-03 08:28 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?