Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Get Timout with LWP

by ecuguru (Monk)
on Jun 20, 2004 at 19:12 UTC ( #368319=perlquestion: print w/ replies, xml ) Need Help??
ecuguru has asked for the wisdom of the Perl Monks concerning the following question:

Will the LWP timeout trigger if some data has been sent, and data communication has been interrupted for x seconds?

I've been using the following code for getting hosted data from a timer controlled field station:
use LWP::Simple; my $url = 'http://site.com/file.html'; print "Getting\n"; my $content = get $url; print "Done\n";
The problem is that sometimes it hangs on the get$url, and never comes back
Getting Done Getting Hang
It's an occasional glitch, but hangs my perl script entirely. I believe that the problem is that the webserver that I am collecting data from might be getting powered off before all of the data is sent back to my perl script, so my perl script is hanging for MORE data after getting the starting few bytes.
I have moved to LWP::agent using code ripped from:
#http://www.utsc.utoronto.ca/~harper/cscb09/lecture11.html
use LWP::UserAgent; use LWP; $ident = "The Server"; # this gets logged $timeout = 5; # in seconds $ua = new LWP::UserAgent; # call the constructor $ua->agent($ident); # set the id $ua->timeout($timeout); # timeout my $req = new HTTP::Request GET => 'http://site.com';
Will the timeout used here timeout even after getting some parts of the data, or will it only timeout if it doesn't get anything?

Comment on Get Timout with LWP
Select or Download Code
Re: Get Timout with LWP
by Zaxo (Archbishop) on Jun 20, 2004 at 20:04 UTC

    According to perldoc LWP::UserAgent it will:

    The requests is aborted if no activity on the connection to the server is observed for "timeout" seconds.

    After Compline,
    Zaxo

Re: Get Timout with LWP
by tachyon (Chancellor) on Jun 21, 2004 at 00:43 UTC

    LWP::Simple is just that. Simple. It will hang if it can open a socket to a server but the server then plays dead and does not respond. LWP::UserAgent is much more sophisticated with its timeouts and will return if it times out.......

    BUT THERE IS A CAVEAT: Because of the way it is implemented in LWP::UserAgent the timeout is best termed as *fuzzy*. It will timeout but if you have a connection to a (say) overloaded server that trickle feeds data back to your client application it may take orders of magnitude longer to timeout than the number you specify. If you absolutely need a hard limit use alarm.

    Here is an example. Although the timeout is set to 2 seconds LWP will wait 30 seconds to get the trickle data. Effectively what the timeout means to LWP is 1) make the connection within TIMEOUT seconds and 2) get at least one byte of data from a non blocking socket every TIMEOUT seconds. It will return if either of these are not fulfilled but as you can see you can still kinda hang it.

    As an aside it you are scraping sites, and they run anti-scraping response throttling this scenario if quite realistic.

    # this is the test server that does a tricke feed response #!/usr/bin/perl use IO::Socket; use IO::Select; $lsn = IO::Socket::INET->new( Listen => 1, LocalAddr => 'localhost', LocalPort => 9000,); my $client = new IO::Select( $lsn ); while( my @ready = $client->can_read ) { for my $fh (@ready) { if($fh == $lsn) { warn "Accepted new socket\n"; my $new = $lsn->accept; $client->add($new); } else { # Process socket warn "Getting data\n"; $data = <$fh>; # yeah yeah this is only a toy app warn "Got $data\nDoing stuff slowly!\n"; my @response = split '', "HTTP/1.1 200 OK\n\nHello World!\ +n"; for( @response) { warn $_; sleep 1; print {$fh} $_; } $client->remove($fh); $fh->close(); } } } # this is the test client #!/usr/bin/perl use LWP::UserAgent; use Data::Dumper; my $ua = LWP::UserAgent->new( timeout => 2 ); $response = $ua->get('http://localhost:9000/'); print Dumper $response;

    cheers

    tachyon

      LWP::Simple is just that. Simple. It will hang if it can open a socket to a server but the server then plays dead and does not respond. LWP::UserAgent is much more sophisticated with its timeouts and will return if it times out.......

      This is quite wrong. LWP::Simple is just a wrapper on top of LWP::UserAgent so it behaves exactly same way as LWP::UserAgent concerning timeout handling. The only exception is that LWP::Simple always uses default timeout value (180 seconds) and with LWP::UserAgent timeout can be set.

      --
      Ilya Martynov, ilya@iponweb.net
      CTO IPonWEB (UK) Ltd
      Quality Perl Programming and Unix Support UK managed @ offshore prices - http://www.iponweb.net
      Personal website - http://martynov.org

        Unfortunately I am not wrong. The hanging behaviour is real. The underlying reason is LWP::Simple uses (for http) the _http_trivial_get() function, not LWP::UserAgent. This implements a 60 second IO::Socket::INET make connection timeout, but no timeout on data recieve. Modifying the test code to use LWP::Simple proves the point (LWP::Simple will wait 500 seconds to get the data - it gets an instant socket, then nothing for 500 seconds). This function remains active in the latest LWP::Simple.

        I agree it should really be a subclass of LWP::UserAgent but for http it is not. The timeout behaviour for LWP::Simple and LWP::UserAgent could IMHO use some work.

        C:\>type server.pl #!/usr/bin/perl use IO::Socket; use IO::Select; $lsn = IO::Socket::INET->new( Listen => 1, LocalAddr => 'localhost', LocalPort => 9000,); my $client = new IO::Select( $lsn ); while( my @ready = $client->can_read ) { for my $fh (@ready) { if($fh == $lsn) { warn "Accepted new socket\n"; my $new = $lsn->accept; $client->add($new); } else { # Process socket warn "Getting data\n"; $data = <$fh>; # yeah yeah warn "Got $data\nDoing nothing forever!\n"; my @response = split '', "HTTP/1.1 200 OK\n\nHello World!\ +n"; sleep 500; print {$fh} @response; $client->remove($fh); $fh->close(); } } } C:\>type lwp.pl #!/usr/bin/perl $|++; use LWP::Simple; $start = time(); print "Begin at $start\n"; $response = get('http://localhost:9000/'); $end = time(); my $time = $end - $start; print "Done at $end\nTook $time seconds\nGot:\n$response\n"; C:\>perl lwp.pl Begin at 1087868116 Done at 1087868616 Took 500 seconds Got: Hello World! C:\>

        cheers

        tachyon

        Further to the previous, here is a possible patch to implement timeout behaviour in LWP::Simple. There is no way to set $timeout passed to _trivial_http_get() included. A global would be easiest I guess given the very simple interface. Defaults to 180 seconds.

        $ diff -u Perl/site/lib/LWP/Simple.pm Perl/site/lib/LWP/Simple-patch.p +m --- Perl/site/lib/LWP/Simple.pm Tue Jun 22 12:40:14 2004 +++ Perl/site/lib/LWP/Simple-patch.pm Tue Jun 22 12:42:15 2004 @@ -154,9 +154,10 @@ sub _trivial_http_get { - my($host, $port, $path) = @_; + my($host, $port, $path, $timeout) = @_; #print "HOST=$host, PORT=$port, PATH=$path\n"; - + $timeout ||=180; + my $hard_timeout = time() + $timeout; require IO::Socket; local($^W) = 0; my $sock = IO::Socket::INET->new(PeerAddr => $host, @@ -174,8 +175,15 @@ my $buf = ""; my $n; - 1 while $n = sysread($sock, $buf, 8*1024, length($buf)); - return undef unless defined($n); + while( 1 ) { + my $remaining = $hard_timeout - time(); + return undef if $remaining <= 0; + if ( can_read( $sock, $remaining ) ) { + $n = sysread($sock, $buf, 8*1024, length($buf)); + return undef unless defined($n); + last if $n == 0; # we are eof + } + } if ($buf =~ m,^HTTP/\d+\.\d+\s+(\d+)[^\012]*\012,) { my $code = $1; @@ -191,6 +199,15 @@ } return $buf; +} + +sub can_read { + my($sock, $timeout) = @_; + my $fbits = ''; + vec($fbits, fileno($sock), 1) = 1; + my $nfound = select($fbits, undef, undef, $timeout); + die "select failed: $!" unless defined $nfound; + return $nfound > 0; } Administrator@JAMES /cygdrive/d

        cheers

        tachyon

Re: Get Timout with LWP
by tachyon (Chancellor) on Jun 21, 2004 at 06:28 UTC

    As noted above LWP's idea of a timeout is fuzzy. To make it behave like the timeout is absolute you can add a little code to LWP/Protocol/http in the SocketMethods package near the end add these lines:

    # Oops, patch removed for further testing (aka works per se but breaks + other stuff)

    cheers

    tachyon

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://368319]
Approved by Zaxo
Front-paged by fuzzyping
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others wandering the Monastery: (9)
As of 2014-07-30 06:10 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (229 votes), past polls