Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

How To Download DAT Files From Unsecured Website

by Marjan (Initiate)
on Jan 31, 2012 at 01:29 UTC ( #950854=perlquestion: print w/ replies, xml ) Need Help??
Marjan has asked for the wisdom of the Perl Monks concerning the following question:

Monks:

I need to download some data files from a website, and would welcome any suggestions.

The website has three pull-down menus: firm name, date, and file format. I want to download data for every firm and date. In other words, firm1-June2001, firm1-July2001, …, firm1-December2011, firm2-June2001, firm2-July2001, …, firm2-December2011. I would also like to choose “dat” from the format pull-down menu, and need to press the download button to download the file to my machine.

I also would like to slow the download speed down so I don’t overload the website’s server, and have a file that indicates which firm-date files are downloaded and whether errors occurred. For instance, I want to distinguish between a missing file and a download error.

I am running this program on a Windows machine with Chrome.

I found the following code at http://www.perlmonks.org/?node_id=617277 and am looking for any suggestions on how to adapt it. The notations are my additions.
#!/usr/bin/perl -w use strict; use LWP::UserAgent; #$ is a scalar variable, key-->value, LWP is a virutal browser; my $ua = LWP::UserAgent->new; my $user = 'username'; my $pass = 'password'; my $URL = 'https://www.tta.thomson.com/msi/public1_5clients.html'; #Creating a file name from the URL; my $filename = substr( $URL, rindex( $URL, "/" ) + 1 ); #Prints and /n adds new line; print "$filename\n"; #Output filename into IN; open( IN, ">$filename" ) or die $!; print "Fetching $URL\n"; my $expected_length; my $bytes_received = 0; #Fetches a file from a website; my $req = HTTP::Request->new(GET => $URL); $req->authorization_basic($user, $pass); my $res = $ua->request($req, sub { #@_ is plural of $_ 9 (default variable); my ( $chunk, $res ) = @_; # = assigns a variable, length is a length function, bytes; #_received number; $bytes_received += length($chunk); #printf is a special print function, SD Error stream, decimal; #number with percent; #symbol; unless ( defined $expected_length ) { $expected_length = $res->content_length || 0; } if ($expected_length) { printf STDERR "%d%% - ", 100 * $bytes_received / $expected +_length; } print STDERR "$bytes_received bytes received\n"; # XXX Should really do something with the chunk itself print IN $chunk; } ); print $res->status_line, "\n"; #I think IN holds the file; close IN; exit;

Comment on How To Download DAT Files From Unsecured Website
Download Code
Re: How To Download DAT Files From Unsecured Website
by InfiniteSilence (Curate) on Jan 31, 2012 at 02:26 UTC

    I'm not sure that plucking code form the Internet (that you may or may not understand) and asking 'hey...how can you refactor this to do x and y?' is a brilliant strategy. Second, grabbing a whole bunch of files from somebody's website is probably a violation of their terms of service. Nevertheless, you might try something like this:

    linux> perl -e 'for my $month (qw|June July|) {for(1..3){my $url = qq| +http://localhost/foo| . $_ . qq|\.tar&dt=| . qq|$month-2001|; print q +q|Doing >>> $url\n|; `wget $url` }}'
    Replace the start of the URL with your target. Of course you'll need wget, but you can even download that for Windows nowadays. Please don't let this discourage you from learning Perl and trying to understand code that you find on the web. It just isn't a very good place to start if your intent is to learn this language. If your intent isn't to learn the language then you are in the wrong place. There are websites that will write scripts for you for a modest sum.

    Celebrate Intellectual Diversity

      Thank you for your help. The website makes the data available for downloading; as I'm a student I'm using it for research purposes.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://950854]
Approved by planetscape
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others avoiding work at the Monastery: (5)
As of 2014-07-26 05:07 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (175 votes), past polls