Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number

How To Download DAT Files From Unsecured Website

by Marjan (Initiate)
on Jan 31, 2012 at 01:29 UTC ( #950854=perlquestion: print w/replies, xml ) Need Help??
Marjan has asked for the wisdom of the Perl Monks concerning the following question:


I need to download some data files from a website, and would welcome any suggestions.

The website has three pull-down menus: firm name, date, and file format. I want to download data for every firm and date. In other words, firm1-June2001, firm1-July2001, …, firm1-December2011, firm2-June2001, firm2-July2001, …, firm2-December2011. I would also like to choose “dat” from the format pull-down menu, and need to press the download button to download the file to my machine.

I also would like to slow the download speed down so I don’t overload the website’s server, and have a file that indicates which firm-date files are downloaded and whether errors occurred. For instance, I want to distinguish between a missing file and a download error.

I am running this program on a Windows machine with Chrome.

I found the following code at and am looking for any suggestions on how to adapt it. The notations are my additions.
#!/usr/bin/perl -w use strict; use LWP::UserAgent; #$ is a scalar variable, key-->value, LWP is a virutal browser; my $ua = LWP::UserAgent->new; my $user = 'username'; my $pass = 'password'; my $URL = ''; #Creating a file name from the URL; my $filename = substr( $URL, rindex( $URL, "/" ) + 1 ); #Prints and /n adds new line; print "$filename\n"; #Output filename into IN; open( IN, ">$filename" ) or die $!; print "Fetching $URL\n"; my $expected_length; my $bytes_received = 0; #Fetches a file from a website; my $req = HTTP::Request->new(GET => $URL); $req->authorization_basic($user, $pass); my $res = $ua->request($req, sub { #@_ is plural of $_ 9 (default variable); my ( $chunk, $res ) = @_; # = assigns a variable, length is a length function, bytes; #_received number; $bytes_received += length($chunk); #printf is a special print function, SD Error stream, decimal; #number with percent; #symbol; unless ( defined $expected_length ) { $expected_length = $res->content_length || 0; } if ($expected_length) { printf STDERR "%d%% - ", 100 * $bytes_received / $expected +_length; } print STDERR "$bytes_received bytes received\n"; # XXX Should really do something with the chunk itself print IN $chunk; } ); print $res->status_line, "\n"; #I think IN holds the file; close IN; exit;

Replies are listed 'Best First'.
Re: How To Download DAT Files From Unsecured Website
by InfiniteSilence (Curate) on Jan 31, 2012 at 02:26 UTC

    I'm not sure that plucking code form the Internet (that you may or may not understand) and asking ' can you refactor this to do x and y?' is a brilliant strategy. Second, grabbing a whole bunch of files from somebody's website is probably a violation of their terms of service. Nevertheless, you might try something like this:

    linux> perl -e 'for my $month (qw|June July|) {for(1..3){my $url = qq| +http://localhost/foo| . $_ . qq|\.tar&dt=| . qq|$month-2001|; print q +q|Doing >>> $url\n|; `wget $url` }}'
    Replace the start of the URL with your target. Of course you'll need wget, but you can even download that for Windows nowadays. Please don't let this discourage you from learning Perl and trying to understand code that you find on the web. It just isn't a very good place to start if your intent is to learn this language. If your intent isn't to learn the language then you are in the wrong place. There are websites that will write scripts for you for a modest sum.

    Celebrate Intellectual Diversity

      Thank you for your help. The website makes the data available for downloading; as I'm a student I'm using it for research purposes.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://950854]
Approved by planetscape
[Corion]: Hmmm - I thought we'd made the textarea CSS-sized to 100% a long time ago?!
[Corion]: ... nope, it isn't
[LanX]: heh ... 11.78 popes per square mile ...never thought I might find miles handy
[ambrus]: TCLion: it's to make people write non-short writeups in an external editor and save it to a local file. otherwise people will complain that they had a very insightful extended reply but their browser died just when they were almost ready posting it.
[ww]: ++ambrus "their browser died just when...."

How do I use this? | Other CB clients
Other Users?
Others avoiding work at the Monastery: (10)
As of 2017-03-23 18:08 GMT
Find Nodes?
    Voting Booth?
    Should Pluto Get Its Planethood Back?

    Results (292 votes). Check out past polls.