Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much

Re: Download web page including css files, images, etc.

by starX (Chaplain)
on Jan 25, 2007 at 14:54 UTC ( #596501=note: print w/replies, xml ) Need Help??

in reply to Download web page including css files, images, etc.

I would take a look at HTTP LITE. It should be easy enough for Perl to download a web page, do a regexp scan for the files you're looking for, save that file to disk as index.html, and then start downloading all the other items you're looking for. Something like...
use HTTP::Lite; my $http = new HTTP::Lite; my $req = $http->request("") or die "Unable to get document: $!"; my $mirror_home = '/home/user/mirror_home/'; my (@javascript, @css, @jpg); my $i = 0; while ($http->body()){ if ($_ =~ m/*.jpg/){ push $_, @jpg;} else if ($_ =~ m/*.js/){ push $_, @javascript;} else if ($_ =~ m/*.css/){ push $_, @css;} } open FILE, "> $mirror_home/index.html" or die "Couldn't open $mirror_home/index.html : $!"; print FILE $http->body(); close FILE; while ($i <= $#css){ $req = $http->request("$css[$i]") or die "Unable to get document: $!"; open FILE, "> $mirror_home/$css[$i]"; print FILE $http->body(); close FILE; $i++ } $i = 0; # Then repeat for other extensions.
As a fair warning the above is definitely untested and probably horribly over-simplified, but the basic idea seems sound to me.

Replies are listed 'Best First'.
Re^2: Download web page including css files, images, etc.
by trendle (Novice) on Feb 08, 2012 at 03:25 UTC
    Yes (just in case anyone tries it) it is untested... and unfortunately has some bugs. Apart from the syntax errors that are quickly fixed (eg should be 'push @x , $_' not the other order used, there's one HUGE problem. The WHILE statement, as written, will continue to download from the web page forever ! There's no end condition since the $http->body() grabs the whole page over and over. So I think this is a good starting point.... but you then need take the 'html' returned by $http->body() and use an html parser to get the bits you want. Sorry but I don't have the code for this at present. If I get something working I'll post it. But I thought it wise to warn the unwary.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://596501]
and the voices are still...

How do I use this? | Other CB clients
Other Users?
Others cooling their heels in the Monastery: (2)
As of 2018-03-24 00:58 GMT
Find Nodes?
    Voting Booth?
    When I think of a mole I think of:

    Results (297 votes). Check out past polls.