Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Re: Get Timout with LWP

by tachyon (Chancellor)
on Jun 21, 2004 at 00:43 UTC ( #368344=note: print w/ replies, xml ) Need Help??


in reply to Get Timout with LWP

LWP::Simple is just that. Simple. It will hang if it can open a socket to a server but the server then plays dead and does not respond. LWP::UserAgent is much more sophisticated with its timeouts and will return if it times out.......

BUT THERE IS A CAVEAT: Because of the way it is implemented in LWP::UserAgent the timeout is best termed as *fuzzy*. It will timeout but if you have a connection to a (say) overloaded server that trickle feeds data back to your client application it may take orders of magnitude longer to timeout than the number you specify. If you absolutely need a hard limit use alarm.

Here is an example. Although the timeout is set to 2 seconds LWP will wait 30 seconds to get the trickle data. Effectively what the timeout means to LWP is 1) make the connection within TIMEOUT seconds and 2) get at least one byte of data from a non blocking socket every TIMEOUT seconds. It will return if either of these are not fulfilled but as you can see you can still kinda hang it.

As an aside it you are scraping sites, and they run anti-scraping response throttling this scenario if quite realistic.

# this is the test server that does a tricke feed response #!/usr/bin/perl use IO::Socket; use IO::Select; $lsn = IO::Socket::INET->new( Listen => 1, LocalAddr => 'localhost', LocalPort => 9000,); my $client = new IO::Select( $lsn ); while( my @ready = $client->can_read ) { for my $fh (@ready) { if($fh == $lsn) { warn "Accepted new socket\n"; my $new = $lsn->accept; $client->add($new); } else { # Process socket warn "Getting data\n"; $data = <$fh>; # yeah yeah this is only a toy app warn "Got $data\nDoing stuff slowly!\n"; my @response = split '', "HTTP/1.1 200 OK\n\nHello World!\ +n"; for( @response) { warn $_; sleep 1; print {$fh} $_; } $client->remove($fh); $fh->close(); } } } # this is the test client #!/usr/bin/perl use LWP::UserAgent; use Data::Dumper; my $ua = LWP::UserAgent->new( timeout => 2 ); $response = $ua->get('http://localhost:9000/'); print Dumper $response;

cheers

tachyon


Comment on Re: Get Timout with LWP
Download Code
Re^2: Get Timout with LWP
by IlyaM (Parson) on Jun 21, 2004 at 15:54 UTC
    LWP::Simple is just that. Simple. It will hang if it can open a socket to a server but the server then plays dead and does not respond. LWP::UserAgent is much more sophisticated with its timeouts and will return if it times out.......

    This is quite wrong. LWP::Simple is just a wrapper on top of LWP::UserAgent so it behaves exactly same way as LWP::UserAgent concerning timeout handling. The only exception is that LWP::Simple always uses default timeout value (180 seconds) and with LWP::UserAgent timeout can be set.

    --
    Ilya Martynov, ilya@iponweb.net
    CTO IPonWEB (UK) Ltd
    Quality Perl Programming and Unix Support UK managed @ offshore prices - http://www.iponweb.net
    Personal website - http://martynov.org

      Unfortunately I am not wrong. The hanging behaviour is real. The underlying reason is LWP::Simple uses (for http) the _http_trivial_get() function, not LWP::UserAgent. This implements a 60 second IO::Socket::INET make connection timeout, but no timeout on data recieve. Modifying the test code to use LWP::Simple proves the point (LWP::Simple will wait 500 seconds to get the data - it gets an instant socket, then nothing for 500 seconds). This function remains active in the latest LWP::Simple.

      I agree it should really be a subclass of LWP::UserAgent but for http it is not. The timeout behaviour for LWP::Simple and LWP::UserAgent could IMHO use some work.

      C:\>type server.pl #!/usr/bin/perl use IO::Socket; use IO::Select; $lsn = IO::Socket::INET->new( Listen => 1, LocalAddr => 'localhost', LocalPort => 9000,); my $client = new IO::Select( $lsn ); while( my @ready = $client->can_read ) { for my $fh (@ready) { if($fh == $lsn) { warn "Accepted new socket\n"; my $new = $lsn->accept; $client->add($new); } else { # Process socket warn "Getting data\n"; $data = <$fh>; # yeah yeah warn "Got $data\nDoing nothing forever!\n"; my @response = split '', "HTTP/1.1 200 OK\n\nHello World!\ +n"; sleep 500; print {$fh} @response; $client->remove($fh); $fh->close(); } } } C:\>type lwp.pl #!/usr/bin/perl $|++; use LWP::Simple; $start = time(); print "Begin at $start\n"; $response = get('http://localhost:9000/'); $end = time(); my $time = $end - $start; print "Done at $end\nTook $time seconds\nGot:\n$response\n"; C:\>perl lwp.pl Begin at 1087868116 Done at 1087868616 Took 500 seconds Got: Hello World! C:\>

      cheers

      tachyon

      Further to the previous, here is a possible patch to implement timeout behaviour in LWP::Simple. There is no way to set $timeout passed to _trivial_http_get() included. A global would be easiest I guess given the very simple interface. Defaults to 180 seconds.

      $ diff -u Perl/site/lib/LWP/Simple.pm Perl/site/lib/LWP/Simple-patch.p +m --- Perl/site/lib/LWP/Simple.pm Tue Jun 22 12:40:14 2004 +++ Perl/site/lib/LWP/Simple-patch.pm Tue Jun 22 12:42:15 2004 @@ -154,9 +154,10 @@ sub _trivial_http_get { - my($host, $port, $path) = @_; + my($host, $port, $path, $timeout) = @_; #print "HOST=$host, PORT=$port, PATH=$path\n"; - + $timeout ||=180; + my $hard_timeout = time() + $timeout; require IO::Socket; local($^W) = 0; my $sock = IO::Socket::INET->new(PeerAddr => $host, @@ -174,8 +175,15 @@ my $buf = ""; my $n; - 1 while $n = sysread($sock, $buf, 8*1024, length($buf)); - return undef unless defined($n); + while( 1 ) { + my $remaining = $hard_timeout - time(); + return undef if $remaining <= 0; + if ( can_read( $sock, $remaining ) ) { + $n = sysread($sock, $buf, 8*1024, length($buf)); + return undef unless defined($n); + last if $n == 0; # we are eof + } + } if ($buf =~ m,^HTTP/\d+\.\d+\s+(\d+)[^\012]*\012,) { my $code = $1; @@ -191,6 +199,15 @@ } return $buf; +} + +sub can_read { + my($sock, $timeout) = @_; + my $fbits = ''; + vec($fbits, fileno($sock), 1) = 1; + my $nfound = select($fbits, undef, undef, $timeout); + die "select failed: $!" unless defined $nfound; + return $nfound > 0; } Administrator@JAMES /cygdrive/d

      cheers

      tachyon

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://368344]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others musing on the Monastery: (9)
As of 2014-07-23 05:52 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (133 votes), past polls