Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Timing out POE http client

by ryantate (Friar)
on Mar 28, 2006 at 23:00 UTC ( #539816=perlquestion: print w/replies, xml ) Need Help??

ryantate has asked for the wisdom of the Perl Monks concerning the following question:

There are plenty of great examples of building HTTP clients with POE. For starters, merlyn's Oct 2002 LinuxMag piece, a section of the POE Cookbook and even a post here on Perlmonks by POE creator Rocco Caputo.

If that weren't enough, the dang thing can download 27 Web pages in less than three seconds.

But there's one weakness to POE::Component::Client::HTTP no one seems to mention, which is that is cannot time out Web connections. In fact, while timeouts are seemingly utlized in many of the above examples, the current http client docs explicitly state, under a BUGS section at the very end:

The following spawn() parameters are accepted but not yet implemented: Timeout.

Instead, what the POE http client does is trigger the "response" event for each session within the timeout period, but does not close out a session until the http connection is actually closed. So $poe_kernel->run will not return until all http sessions are done, even if the timeout is well past.

My question: What is the best way to enforce a timeout on POE::Component::Client::HTTP such that $poe_kernel->run will return at or very near the timeout period, instead of many seconds later?

The only solution I have is to call $kernel->stop within the response handler when the last session response is done. But the stop method is experimental and has some serious caveats.

Some examples of POE::Component::Client::HTTP timeout in action:

The script (poe_delay.pl):

use strict; use warnings; use Time::HiRes qw( time ); use HTTP::Request; use POE qw(Component::Client::HTTP); my $start = time; my $urls_left = 12; POE::Component::Client::HTTP->spawn( Alias => 'ua', Timeout => shift || 10, FollowRedirects => 2, Streaming => 0, ); POE::Session->create( inline_states => { _start => \&client_start, response => \&response_handler } ); $poe_kernel->run; print 'Run done in: ', time - $start, " seconds.\n"; exit 0; sub client_start{ my $kernel = $_[KERNEL]; while (<DATA>) { chomp; $kernel->post( 'ua', # posts to the 'ua' alias 'request', # posts to ua's 'request' state 'response', # which of our states will receive the r +esponse HTTP::Request->new('GET', "http://$_") ); } } sub response_handler { my ($request_packet, $response_packet, $kernel, $heap) = @_[ARG0, AR +G1, KERNE\ L, HEAP]; $urls_left--; my $request_object = $request_packet->[0]; my $response_object = $response_packet->[0]; if ($urls_left <= 0) { print 'Downloads done in: ', time - $start, " seconds.\n"; # $kernel->stop; } } __DATA__ www.google.com www.yahoo.com www.amazon.com www.ebay.com news.yahoo.com news.google.com www.msn.com www.slashdot.org www.indymedia.org www.sfgate.com www.nytimes.com www.cnn.com

Now if we run the above script with a timeout of 5 seconds, $kernel->run does not return until well after the responses are all in:

ryantate [507] perl -w poe_delay.pl 5 Downloads done in: 5.41008615493774 seconds. Run done in: 16.4065310955048 seconds.

If we increase the timeout to 20, we see that the responses and $kernel->run finish at the same time. This is because all HTTP connections are closed within 20 seconds, I believe:

ryantate [508] perl -w poe_delay.pl 20 Downloads done in: 20.4675362110138 seconds. Run done in: 20.4807510375977 seconds.

Finally, if we uncomment the $kernel->stop line inside sub response_handler in the above script, we get:

ryantate [513] perl -w poe_delay.pl 5 Downloads done in: 5.35386991500854 seconds. Run done in: 5.36102390289307 seconds.

Replies are listed 'Best First'.
Re: Timing out POE http client
by rcaputo (Chaplain) on Mar 30, 2006 at 06:03 UTC

    Hey. First, you should also try contacting the author. I mainly read the site via RSS, and the sheer number of posts guarantees I miss all but the last 12 at any given time. I would have missed your post if someone hadn't pointed it out to me.

    The problem is twofold. First, there was a bug in POE::Component::Client::HTTP. Old request timeouts were not removed when the component followed redirects. I was able to find the problem thanks to your test program, so ++ to you. Version 0.74 should be on your favorite CPAN mirror any day now.

    The second problem is how POE::Component::Client::HTTP uses POE::Component::Client::Keepalive: simplistically. The short explanation is that unused connections are kept alive longer than you probably want. It's easy enough to work around. Replace the start of your program with this:

    use strict; use warnings; use Time::HiRes qw( time ); use HTTP::Request; use POE qw(Component::Client::HTTP Component::Client::Keepalive); my $start = time; my $urls_left = 12; my $cm = POE::Component::Client::Keepalive->new( keep_alive => 1 ); POE::Component::Client::HTTP->spawn( Alias => 'ua', Timeout => shift || 10, FollowRedirects => 2, Streaming => 0, ConnectionManager => $cm, );

    The new code creates a custom POE::Component::Client::Keepalive with a super-low keep-alive timeout. Unused sockets are discarded after 1 second, so they stop holding program hostage for so long. When POE::Component::Client::HTTP is spawned, it's told to manage connections with the custom Keepalive component rather than simplistically create its own.

    These changes work for me, at least with the newly released timeout fix.

    1) poerbook:~/projects/poco-client-http% make && perl perlmonks.perl 2 +0 Response (200) Downloads done in: 7.075767993927 seconds. Run done in: 8.70663499832153 seconds.

    I also removed that warning from the BUGS. Its very presence was a bug. And speaking of bugs, ConnectionManager isn't documented. I'll create a ticket for that at rt.cpan.org and get it in the next release.

    -- Rocco Caputo - http://poe.perl.org/

      Hello Rocco and thank you for your reply! Next time I will try emailing you directly. I didn't know if this was a common issue.

      I have a follow up question. Instead of waiting for the (admittedly short) one-second keepalive timeout, could I not call the shutdown method on the ConnectionManager when my results are all back? I know it is in $heap->{cm} in the parent Component::Client::HTTP session, but I do not know how to access this heap. "cm" does not seem to be in the heap available to my response handler.

        Good question. The short answer's no, POE::Component::Client::HTTP doesn't expose the connection manager it uses. It also doesn't have a way to be shut down ahead of time, which hasn't been needed until now.

        If you're creating your own POE::Component::Client::Keepalive and passing that as the ConnectionManager, however, you still have $cm floating around, and you can shut that down without ill effects. Caveat: I've only tried this in your example program, and only when there are no outstanding requests.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://539816]
Approved by astaines
Front-paged by astaines
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others romping around the Monastery: (7)
As of 2021-06-22 17:58 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    What does the "s" stand for in "perls"? (Whence perls)












    Results (108 votes). Check out past polls.

    Notices?