Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw

Automatic Download of Multiple Files with Perl

by monkfan (Curate)
on Apr 16, 2007 at 15:03 UTC ( #610366=perlquestion: print w/ replies, xml ) Need Help??
monkfan has asked for the wisdom of the Perl Monks concerning the following question:

For example now I want to download this list of files.

What I usually do is to use "wget" one by one for each of the url of each file. Is there a way we can do it automatically with Perl? Since each of the file is quite large I need to download it overnight.

I am aware that I can write a parser for that website , get the *.gz link for each of the file and run wget for each of them.

I was thinking if it is could be done in quicker way. Any CPAN module that does this kind of task?


Comment on Automatic Download of Multiple Files with Perl
Re: Automatic Download of Multiple Files with Perl
by Joost (Canon) on Apr 16, 2007 at 15:16 UTC
      I second Joost' suggestion to use or embed wget to get an entire directory in one shot.

      Even while I like and use LWP a lot for web site interaction, from personal experience I have seen LWP hang upon download of larger files (>200MB) when deployed over less reliable networks whereas wget has never let me down in such context.
      It's feature to restart downloads (even from the bytecount point where connection got lost) of files upon network failures is a nice one and it can run unattended.


      It will be less of an hassle to use external programs optimized for downloading like wget or curl instead of trying to use LWP. LWP is a general purpose library to interact with websites and it sometimes croaks when downloading large files. Also, LWP does not have support for downloading entire directories and resuming aborted downloads. All that logic has to be implemented in your code. With wget/curl - you get all these "built-in" and your perl code is a simple wrapper.

      offtopic: In college, we had a thin pipe connection and we could not download research papers without timeouts, etc. I used to create a file with the list of URLs to fetch, run wget-f <filename> in a detached screen session and it would download the entire file overnight. I think you need to do something similar here.


Re: Automatic Download of Multiple Files with Perl
by jettero (Monsignor) on Apr 16, 2007 at 15:18 UTC

    You want to read about WWW::Mechanize, the descendant of LWP that will help you complete your task. It's surprisingly easy. You'll start with three lines (from the synopsis) and have a simple 20 line program that downloads everything before you know it.

    Update: Here's an example of pulling and writing a file directly to a file. I consider that task to be somewhat un-obvious (or at least, not obviously documented):

    use strict; use WWW::Mechanize; my $mech = WWW::Mechanize->new; $mech->get(""); $mech->get( $mech->find_link(text=>"F")->url, ":content_file"=>"cur +rent-kernel.tar.bz2" );


Re: Automatic Download of Multiple Files with Perl
by Corion (Pope) on Apr 16, 2007 at 15:24 UTC

    See the batch download section in the documentation of WWW::Mechanize::Shell. Following the example there will create a WWW::Mechanize script for you that performs a batch download of all links of a certain class from a web page.

Re: Automatic Download of Multiple Files with Perl
by logie17 (Friar) on Apr 16, 2007 at 16:26 UTC
    You might just browse through CPAN and see if there is anything else. A quick search found a couple interesting things that may work for you. Just do a search for wget, there is: Dicop::Client::wget and File::Fetch.
    s;;5776?12321=10609$d=9409:12100$xx;;s;(\d*);push @_,$1;eg;map{print chr(sqrt($_))."\n"} @_;
Re: Automatic Download of Multiple Files with Perl
by Scott7477 (Chaplain) on Apr 16, 2007 at 21:48 UTC

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://610366]
Approved by Joost
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others meditating upon the Monastery: (7)
As of 2014-07-25 23:11 GMT
Find Nodes?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:

    Results (175 votes), past polls