Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Web Spidering module?

by elbie (Deacon)
on Sep 23, 2001 at 02:05 UTC ( #114103=perlquestion: print w/ replies, xml ) Need Help??
elbie has asked for the wisdom of the Perl Monks concerning the following question:

Is there an existing module for spidering a website and returning a list of URLs? I found this but it would be nice if I could just have a module that does all the dirty work for me so I don't have to worry about it.

elbieelbieelbie

Comment on Web Spidering module?
Re: Web Spidering module?
by jryan (Vicar) on Sep 23, 2001 at 02:22 UTC
    Believe it or not, I started working on one just the other day... I'll post it when I am done (which will probably be the next chance I get a large chunk of time to code). However, if anyone knows if something like this is already in existence, please let me know so I can work on something else :)
Re: Web Spidering module?
by charnos (Friar) on Sep 23, 2001 at 02:26 UTC
    While I've never used the module myself, WWW::Robot seems to be exactly what you are looking for. It seems to be a decent solution, but it has alot of external module dependencies. Good luck spidering!
Re: Web Spidering module?
by Starky (Chaplain) on Sep 23, 2001 at 10:46 UTC
    I can't seem to dig up the specific reference, but there was a Perl Journal article awhile back which, if I recall correctly, had a 4 line web spider.

    If someone out there could dig that up, it would be worth a read if only for amusement value :-)

        I found HTML::LinkExtor (also used in the one line spider mentioned above) on the weekend, and it was quite helpful.

        Thanks everybody!

        elbieelbieelbie

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://114103]
Approved by root
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others romping around the Monastery: (14)
As of 2014-07-28 17:38 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (204 votes), past polls