Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

website archiver

by xorl (Deacon)
on Jan 13, 2009 at 04:19 UTC ( [id://735858]=perlquestion: print w/replies, xml ) Need Help??

xorl has asked for the wisdom of the Perl Monks concerning the following question:

I'm sure this is one of those things that has already been coded and is out there somewhere. I'm looking for something that given a URL will spider the site and store a copy of the site locally.

I have a pretty good idea of how to go about writing one. I'm just lazy. I figure if no one can find something out there, I can always take linklint and make it save the files after it checks it (is that a good idea?).

Thanks in advance.

Replies are listed 'Best First'.
Re: website archiver
by doom (Deacon) on Jan 13, 2009 at 04:31 UTC
    I think you're looking for the "wget" command. Myself I tend to do this, but if you're doing this for archival purposes you might prefer to do it differently (e.g. without "-k" of "-H", and maybe without "-l"):
    wget -r -l 8 -w 100 -k -p -np -H <URL>
    Briefly what these options do (read the man page):
    -r is recursive -l is the max depth -w is the wait, seconds between retrievals -k convert-links for local viewing -p gets all "page-requisites", e.g. images, stylesheets -np "no parent", means to avoid following links to levels above the starting point. -H Enable spanning across hosts when doing recursive retrieving.
Re: website archiver
by Arunbear (Prior) on Jan 13, 2009 at 07:26 UTC
Re: website archiver
by Anonymous Monk on Jan 13, 2009 at 07:24 UTC

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://735858]
Approved by planetscape
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others exploiting the Monastery: (2)
As of 2024-04-26 05:32 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found