Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

search a foreign directory

by Anonymous Monk
on Apr 24, 2001 at 20:59 UTC ( #75137=perlquestion: print w/ replies, xml ) Need Help??
Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

i am trying to write a program that searches the documents in a foreign website simply because no body puts a search engine in their site themselves. i get how to search for words in documents in my own site, but how do i do it in someone elses? my dream is to be able to just put in; search:www.whatever.com for:perl, and then it will search all the documents in that directory for the word perl and then send back links. IS THAT TOO MUCH TO ASK?!?!?! : )

Comment on search a foreign directory
Replies are listed 'Best First'.
Re: search a foreign directory
by suaveant (Parson) on Apr 24, 2001 at 21:04 UTC
    well... the easy way would be to use something like wget which supports recursive downloads, then search it locally...

    Or you could write a web spider of your own in perl using LWP and search the pages each time (or make a local copy as with the wget). Probably be a good idea to cache the pages locally for a while and search them locally, then rebuild the link and go to the actual site.

    of course, google and altavista have an option to search within a specific domain, so if they are in there you could just use them :)

    Update BTW, to use the domain searching in AV and google go to their Advanced Search pages
                    - Ant

      You dont have to go to the advanced search page. Altavista supports (amongst others) these nice little shortcuts:

      +host:domain.com - only search for results on this domain
      +link:domain.com - only search for results that link to domain.com

      The latter is useful when you want to see who's linked to your site :)

      Just include them along with your search term to restrict. This means you can create a search of the site easily by auto populating a search box with the +host:domain.com string and allowing users to enter their term. Or use JavaScript to hide the term and present an empty search box.

      cLive ;-)

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://75137]
Approved by root
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (9)
As of 2015-07-29 10:05 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (263 votes), past polls