Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

Website Mirroring

by willyyam (Priest)
on Feb 23, 2006 at 03:27 UTC ( #532149=perlquestion: print w/ replies, xml ) Need Help??
willyyam has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks. I trying to mirror mebsites from the client end only, and running into a slight snag. I was using wget -prFNEkm (a wonderful set of flags, which does its level best to create a browseable local copy of a site), but it has a shortfall - stylesheets linked with the @import convention and images linked in stylesheets are not downloaded, and not relativized for local viewing.

I was hoping that CPAN, Google or a super search would direct me to someone else who has solved this problem, but so far, no luck. Do any monks know a way (other than hacking wget (written in C, which I don't speak) or extending it by hand with a Perl wrapper to get a locally browseable copy of a website?

Update: The core of the problem is that wget doesn't parse CSS files for url()s, and doesn't retrieve stylesheets called via the @import convention . So I'm looking for an alternative.

Comment on Website Mirroring
Select or Download Code
Re: Website Mirroring
by spiritway (Vicar) on Feb 23, 2006 at 04:47 UTC

    You could replace -rN with -m. I don't think that will fix your problem, but it's used for mirroring.

    A module that may be useful is URI. This might help if you're getting the stylesheets and images downloaded. Your post was a bit unclear as to whether you're getting any files, or whether they're not relativized.

Re: Website Mirroring
by Anonymous Monk on Feb 23, 2006 at 12:59 UTC
    Try httrack; if it doesn't support URIs in CSS yet, file a feature request there. I had success with one some years ago.

      Excellent recommendation! This does all I need. Thank you kindly.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://532149]
Approved by spiritway
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others contemplating the Monastery: (9)
As of 2014-12-21 16:11 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (106 votes), past polls