Beefy Boxes and Bandwidth Generously Provided by pair Networks RobOMonk
Perl-Sensitive Sunglasses
 
PerlMonks  

Ways to limit bandwidth of downloads?

by bwelch (Curate)
on Apr 18, 2006 at 14:43 UTC ( #544077=perlquestion: print w/ replies, xml ) Need Help??
bwelch has asked for the wisdom of the Perl Monks concerning the following question:

I've inherited a system that downloads bioinformatics data from various sites using Net::FTP. There are many scripts that use ftp to examine files and download those that are new and desired. All this works fairly well, but I've been asked to add a way to limit the bandwidth used by our downloads.

There are many cases of code like this:

$f_curdir = $ftp1->pwd(); $rc = $ftp1->cwd($dir_to_use); $ftp1->binary; @new_remote_list = $ftp1->dir(".");

It's looking like I'll need to replace all uses of ftp with either wget or cURL, and that's looking like a lot of work to gain one feature. Can you think of a way to limit the bandwidth use of these downloads without replacing the use of Net::FTP ?

Comment on Ways to limit bandwidth of downloads?
Download Code
Re: Ways to limit bandwidth of downloads?
by Corion (Pope) on Apr 18, 2006 at 14:46 UTC

    I haven't tried it, but using Net::FTP::Throttle instead of Net::FTP should be all you need. Another way could be to add traffic shaping to your kernel configuration, but that will surely be more work.

      it's implemented using "sleep .1". not good...
Re: Ways to limit bandwidth of downloads?
by Fletch (Chancellor) on Apr 18, 2006 at 14:48 UTC

    Not Perl solutions, but these won't require much (if any) modifications to the existing code:

    • consult with your sysadmin and/or network admins and see if they can't put some sort of bandwidth limiting in place at the router or firewall level
    • route things through a squid proxy and let it do the limiting
Re: Ways to limit bandwidth of downloads?
by jhourcle (Prior) on Apr 18, 2006 at 15:21 UTC

    How are the calls to Net::FTP made?

    If the system was designed to be maintainable, you'd want to add a level of abstraction -- pass the URL, or the connection details and file path to a function. If it was written for a single use, and then slowly expanded over time w/ copy&pasting as needed, it's possible that are calls to Net::FTP all over the place.

    The main code should only need to worry about four things -- did the command work, what is in a directory, where was the file downloaded to, and what were the contents of the file? (normally, you only need the third or fourth one, and not both) I typically have two functions -- get file contents, and download file ... 'get file contents' uses the 'download file' routines or visa-versa (depending on what the modules are that I'm using ... IDL forces me to write everything to files first, then load them back in, whereas in Perl, I'll only write out the files when necessary (if I'm looking at caching, 'get file contents' calls 'download files', if there's no reason to cache, it's probably the other way around)

    This way, I can easily add caching, change the underlying code for downloading files, add rate limiting, support for additional transfer protocols, etc, as needed.

    Given your situation, I'd probably first write a few tests, and make sure they all pass. I'd then move to encapsulate all of the current calls to Net::FTP, and then make sure that all of the tests still pass. Then I'd go ahead and change the Net::FTP calls to wget or curl, and make sure everything's still working fine.

    The next alternative is to write your own FTP object, that has the same interfaces exposed as Net::FTP (it might even inherit from Net::FTP), and replace all of the calls to Net::FTP to 'My::FTP' or whatever you end up calling it.

Re: Ways to limit bandwidth of downloads?
by BrowserUk (Pope) on Apr 18, 2006 at 16:00 UTC

    Here's one way. If you supply the hash => \*FHGLOB parameter on the Net:FTP constructor, then a "hash mark" will be printed to the Filehandle glob each buffer load. If you tie the glob before passing it, the PRINT method will be called after each buffer is read. If you insert a short sleep at that point, you will effectively throttle the request for the buffer load.

    This demonstrates the technique. Performing the calculations to allow the download rate to be specified and wrapping it up into clean interface is left as an exercise for those that need it.

    #! perl -slw use strict; use Time::HiRes qw[ time sleep ]; use Net::FTP; $|=1; sub TIEHANDLE { return bless [ 0 ], $_[0] } sub PRINT{ my $self = shift; if( $self->[ 0 ] ) { my $delay = ( $self->[ 0 ]+1 - time() ); printf "%f\n", $delay; sleep 1+$delay; ## Insert delay } $self->[ 0 ] = time(); } local *GLOB; tie *GLOB, 'main'; my( $site, $dir, $file ) = $ARGV[ 0 ] =~ m[ ^(?:ftp://)? ([^/]+) (/.*?) / ([^/]+$) ]x or die "Couldn't parse url"; my $ftp = Net::FTP->new( $site, Hash => \*GLOB ) or die $@; $ftp->login( 'anonymous', 'anonymous@' ); $ftp->cwd( $dir ) or die $@; $ftp->get( $file ) or die $@;

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

      That actually seems like overkill ... ;-) After all, I just wrote in TIMTOWTDI Challenge: Open a file how to use the retr function of Net::FTP to grab stuff - inserting the sleep in there may be simpler than having to tie stuff ;-)

      (And, when I wrote it, I thought it was completely obscure ... here, less than one day later, it's coming in handy!)

        Maybe...though when I played with subclassing Net::FTP when trying to track down a mysterious bug some time ago, I found that once you start messing with doing the buffering yourself, you rapidly end up duplicating quite a lot of what Net::FTP normally does for you. I was more interested in finding the bug than solving the issues with the subclassing, so it's quite possible I was just doing it wrong.

        I thought the tie solution was rather nice because it relied on a documented interface but otherwise left the internals untouched. Theory says that substituting a subclass of a module for the module itself should be simple, but the practice often proves otherwise I found. But, TIMTOWTDI. If you can avoid doing the buffering yourself, then overriding the retr method should work just as well :)


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://544077]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others scrutinizing the Monastery: (16)
As of 2014-04-17 20:17 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    April first is:







    Results (454 votes), past polls