Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

search a foreign directory

by Anonymous Monk
on Apr 24, 2001 at 20:59 UTC ( [id://75137]=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

i am trying to write a program that searches the documents in a foreign website simply because no body puts a search engine in their site themselves. i get how to search for words in documents in my own site, but how do i do it in someone elses? my dream is to be able to just put in; search:www.whatever.com for:perl, and then it will search all the documents in that directory for the word perl and then send back links. IS THAT TOO MUCH TO ASK?!?!?! : )

Replies are listed 'Best First'.
Re: search a foreign directory
by suaveant (Parson) on Apr 24, 2001 at 21:04 UTC
    well... the easy way would be to use something like wget which supports recursive downloads, then search it locally...

    Or you could write a web spider of your own in perl using LWP and search the pages each time (or make a local copy as with the wget). Probably be a good idea to cache the pages locally for a while and search them locally, then rebuild the link and go to the actual site.

    of course, google and altavista have an option to search within a specific domain, so if they are in there you could just use them :)

    Update BTW, to use the domain searching in AV and google go to their Advanced Search pages
                    - Ant

      You dont have to go to the advanced search page. Altavista supports (amongst others) these nice little shortcuts:

      +host:domain.com - only search for results on this domain
      +link:domain.com - only search for results that link to domain.com

      The latter is useful when you want to see who's linked to your site :)

      Just include them along with your search term to restrict. This means you can create a search of the site easily by auto populating a search box with the +host:domain.com string and allowing users to enter their term. Or use JavaScript to hide the term and present an empty search box.

      cLive ;-)

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://75137]
Approved by root
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others rifling through the Monastery: (5)
As of 2024-04-24 08:42 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found