|Perl: the Markov chain saw|
Download web page including css files, images, etc.by skx (Parson)
|on Jan 25, 2007 at 14:09 UTC||Need Help??|
skx has asked for the
wisdom of the Perl Monks concerning the following question:
I would like to download a complete webpage including any referenced .css, .js, and images - and have the page rewritten to reference the local copies.
However to complicated matters I wish to mandate that the initial file will be saved as “index.html” - regardless of what it was originally called.
This appears to rule wget out, as the –output=index.html option trumps the –page-requisites flag (which is used to download images, etc which are referenced.)
Now using LWP I can download the remote URI, and I assume that I could parse links out with HTML::Parser, or TreeBuilder - however this seems like a very simple request so I wondered if there were any existing libraries to do this kind of thing?
Searching CPAN for "http rewrite", "http mirror" didn't find anything that seems suitable, but any pointers greatfully received in case I'm searching for the wrong terms.(Similarly if curl, wget, httrack, etc, can do this with clever options I'm not 100% committed to using perl!)