http://www.perlmonks.org?node_id=1018717

alexkus has asked for the wisdom of the Perl Monks concerning the following question:

Hello. Trying to use HTTP::Async to do some concurrent downloading, but it seems to take much longer than a traditional HTTP request made with LWP::UserAgent.

Minimal code below. I do a grab of a URL (I've replaced my internal URLs with a URL from the BBC website). When I run it I get:-

Doing it 1st way 1360837557 doing request 1360837557 done 1360837557 DEBUG: Status is: 200 OK 1360837557 LENGTH: 153421 1360837557 Doing it 2nd way 1360837557 NEWPOLL ADDED: 1 1360837557 DEBUG: Checking... 1360837558 DEBUG: Checking... 1360837559 DEBUG: Checking... 1360837561 DEBUG: Checking... 1360837562 DEBUG: Checking... 1360837563 DEBUG: Checking... 1360837564 DEBUG: Checking... 1360837565 DEBUG: Checking... 1360837566 DEBUG: Checking... 1360837567 DEBUG: Checking... 1360837568 DEBUG: Got response reqid=1 1360837568 DEBUG: Status is: 200 OK 1360837568 LENGTH: 151059

So the first attempt (via LWP) completes in a fraction of a second. The subsequent request (using HTTP::Async) takes 11 seconds or so. Looking at the tcpdump of the requests being sent I can't really see any difference between the two. And both are made by the same call to HTTP::Request anyway.

Code follows:-

#!/usr/bin/perl # Turn of auto flush for stdout $|=1; use HTTP::Async; use HTTP::Request::Common qw(POST); use LWP::UserAgent; use strict; binmode STDOUT, ":utf8"; my $http_async = HTTP::Async->new( poll_interval => 0.05 ); my $url="http://www.bbc.co.uk/sport/0/"; # $url="http://www.google.com/"; print "Doing it 1st way\n"; my $ua=LWP::UserAgent->new; print time()." doing request\n"; my $resp=$ua->request( HTTP::Request->new( "GET", $url ) ); print time()." done\n"; print time()." DEBUG: Status is: ".$resp->status_line."\n"; print time()." LENGTH: ".length($resp->as_string)."\n"; print time()." Doing it 2nd way\n"; my $reqid = $http_async->add( HTTP::Request->new( "GET", $url ) ); print time()." NEWPOLL ADDED: $reqid\n"; while( 1 ) { # Check for any results if( $http_async->empty() ) { print time()." DEBUG: to_send=".$http_async->to_send_c +ount." in_progress=".$http_async->in_progress_count." to_return=".$ht +tp_async->to_return_count." total=".$http_async->total_count."\n"; sleep(1); next; } print time()." DEBUG: Checking...\n"; my ( $resp, $reqid ) = $http_async->wait_for_next_response(1.0 +); if( !defined( $resp ) ) { next; } print time()." DEBUG: Got response reqid=$reqid\n"; print time()." DEBUG: Status is: ".$resp->status_line."\n"; print time()." LENGTH: ".length($resp->as_string)."\n"; last; }

Any ideas?

Replies are listed 'Best First'.
Re: Slow HTTP::Async responses
by Khen1950fx (Canon) on Feb 14, 2013 at 21:04 UTC
    How about using AnyEvent::HTTP?
    #!/usr/bin/perl use strict; use AnyEvent; use AnyEvent::HTTP; use Time::HiRes qw(time); my $cv = AnyEvent->condvar( cb => sub { print "\n"; print "Starting test...\n"; print "\n"; } ); my $urls = [ 'http://www.google.com', 'http://www.yahoo.com', 'https://pause.perl.org', 'http://www.perlmonks.com', 'http://www.perl.com', 'http://www.cpan.org' ]; my $start = time; my $result; $cv->begin( sub { ( shift() )->send($result); } ); foreach my $url (@$urls) { $cv->begin; my $now = time; my $request; $request = http_request( GET => $url, timeout => 3, sub { my ( $body, $hdr ) = @_; if ( $$hdr{'Status'} =~ /^2/ ) { push( @$result, join( " ", ( $url, "=> \n length", $$hdr{'content-length'}, "\n loaded in", time - $now, "ms" ) ) ); } else { push( @$result, join( "", "Error for ", $url, ": (", $hdr->{Status}, ") ", $hdr->{Reason} ) ); } undef $request; $cv->end; } ); } $cv->end; my $foo = $cv->recv; print join( "\n", @$foo ), "\n" if defined $foo; print "\nTotal elapsed time: ", time - $start, "ms\n\n";
    I got the idea here.
Re: Slow HTTP::Async responses
by bulk88 (Priest) on Feb 15, 2013 at 00:49 UTC
    Ive found none of the Perl HTTP modules work well in async mode. Either they aren't really synchronous (sleep timers, blocking I/O in addition to select, or things I never figured out), or they burn high amounts of CPU polling select. I use Win32::API and a small XS library to make calls to http://msdn.microsoft.com/en-us/library/windows/desktop/aa383630%28v=vs.85%29.aspx which is an internally threaded C library that uses X threads by Y select() with thread pooling design (scales to 100s of simultaneous TCPIP connections and can saturate a 100 mbps line). Look for an equivalent C library with Perl bindings on your OS. I've also used aria2c with its XML-RPC interface and reading the files off (cached in ram by OS) disk synchronously. select doesn't scale. You want to use epoll or some other next generation event system. Perl isn't very efficient on the minutia of turning packets into HTTP transactions.
Re: Slow HTTP::Async responses
by mbethke (Hermit) on Feb 14, 2013 at 21:09 UTC

    I'm on a particularly slow link here so nothing takes less than 5s, but I don't notice much of a difference either way. Sometimes the LWP::UserAgent is faster, sometimes the HTTP::Async one. As I take it you've done a network dump, can you see where the time difference goes, i.e. if it's the sending of the request or the processing of the answer that gets delayed?