Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

Website Mirroring

by willyyam (Priest)
on Feb 23, 2006 at 03:27 UTC ( #532149=perlquestion: print w/ replies, xml ) Need Help??
willyyam has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks. I trying to mirror mebsites from the client end only, and running into a slight snag. I was using wget -prFNEkm (a wonderful set of flags, which does its level best to create a browseable local copy of a site), but it has a shortfall - stylesheets linked with the @import convention and images linked in stylesheets are not downloaded, and not relativized for local viewing.

I was hoping that CPAN, Google or a super search would direct me to someone else who has solved this problem, but so far, no luck. Do any monks know a way (other than hacking wget (written in C, which I don't speak) or extending it by hand with a Perl wrapper to get a locally browseable copy of a website?

Update: The core of the problem is that wget doesn't parse CSS files for url()s, and doesn't retrieve stylesheets called via the @import convention . So I'm looking for an alternative.

Comment on Website Mirroring
Select or Download Code
Re: Website Mirroring
by spiritway (Vicar) on Feb 23, 2006 at 04:47 UTC

    You could replace -rN with -m. I don't think that will fix your problem, but it's used for mirroring.

    A module that may be useful is URI. This might help if you're getting the stylesheets and images downloaded. Your post was a bit unclear as to whether you're getting any files, or whether they're not relativized.

Re: Website Mirroring
by Anonymous Monk on Feb 23, 2006 at 12:59 UTC
    Try httrack; if it doesn't support URIs in CSS yet, file a feature request there. I had success with one some years ago.

      Excellent recommendation! This does all I need. Thank you kindly.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://532149]
Approved by spiritway
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others scrutinizing the Monastery: (5)
As of 2014-09-15 05:46 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite cookbook is:










    Results (145 votes), past polls