http://www.perlmonks.org?node_id=638190

hacker has asked for the wisdom of the Perl Monks concerning the following question:

Previously, I was putting our external mirrors in an array called @mirrors, and rendering the page based on:

   $mirrors[rand @mirrors]/file.tar.bz2

This breaks down when one of the mirror sites is either down, offline or inaccessible.

I'm rewriting the entire site, and wanted to do this in a more-intelligent way, by checking if the remote mirror is accessible before I write those links into the page for the end users. I can do this by using LWP and a HEAD request, checking $response->status_line; for a 200 response code like this:

my $ua = LWP::UserAgent->new; my $url = "$mirrors[rand @mirrors]/file.tar.bz2"; my $request = HTTP::Request->new(HEAD => $url); my $response = $ua->request($request); my $status = $response->status_line; if ($status == '200') {... }

I'd like to take that one step further, and present the user with the FASTEST server hosting our releases. In Debian, there is a tool called 'netselect', and another one that uses it called 'netselect-apt'. The combination of these will pick the fastest Debian mirrors to your current location, and build a sources.list file for you.

Is there some way to do this with LWP or other magical incantations? Some method that will check the array for the fastest response time, and use that entry over any others?

Replies are listed 'Best First'.
Re: Selecting the "fastest" server listed in an array
by kyle (Abbot) on Sep 11, 2007 at 01:40 UTC

    This may work (I haven't tested it):

    use Time::HiRes qw( ualarm tv_interval gettimeofday ); my $ua = LWP::UserAgent->new; my $fastest = undef; MIRROR: foreach my $mirror ( @mirrors ) { my $url = "$mirror/file.tar.bz2"; my $request = HTTP::Request->new(HEAD => $url); # if we have a possible fastest mirror, set an alarm ualarm( $fastest->{response_time} * 1_000_000 ) if defined $fastest; # attempt to fetch from this mirror in less time # than the fastest so far my ( $start_time, $response ); eval { local $SIG{ALRM} = sub { die 'alarm' }; $start_time = [gettimeofday]; $response = $ua->request($request); ualarm( 0 ); }; # if the alarm went off (or some other error), # try next mirror next MIRROR if $@; my $response_time = tv_interval( $start_time ); my $status = $response->status_line; # if this was successful, if ( $status == 200 ) { # if we haven't found a fast mirror yet, # or this one's faster, if ( ! defined $fastest || $fastest->{response_time} > $response_time ) { # store this one as the fastest $fastest = { mirror => $mirror, response_time => $response_time }; } } }

    What's nice about this:

    • No forking.
    • If I've found a live site, I never wait for a dead one. In fact, I never wait for a site slower than the fastest one I've seen.
    • The $fastest that I find is a hash ref, so I can stash more info (such as the actual response) in there later, if I want.

    I don't know how netselect works, so I can't speak to how well this works by comparison. I just hope to give you a good starting point.

Re: Selecting the "fastest" server listed in an array
by mr_mischief (Monsignor) on Sep 11, 2007 at 03:52 UTC
    Consider that fastest to you may not be fastest to your user. There's a fair chance given the nature and size of the internet that visitors to you site might have somewhat different routing. ;-)