Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

Where is LWP::Simple::get error information stored?

by ajam (Acolyte)
on Sep 27, 2012 at 21:10 UTC ( #996077=perlquestion: print w/ replies, xml ) Need Help??
ajam has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks,

I wrote a short program which needed to occasionally download Wikipedia articles. To my frustration the pages were not being downloaded. Luckily, I stumbled upon LWP::Simple::get($url) does not work for some urls, which basically states the default user-agent is blocked. After altering that my program worked like a charm.

But now my question is, how can I programmatically determine errors like this from LWP? It appears nothing is stored in $! after the failed get attempt. Does LWP have its own error variable? It would be nice for a program like the following to relay the error information to the user.

use warnings; use strict; use LWP::Simple; my @urls = ( 'http://www.google.es/', 'http://www.perlmonks.org/', 'http://es.wikipedia.org/', ); foreach my $url (@urls) { my $content = get $url; $url =~ /\.(.*)\./; if ($content) { say "Received response from $1"; } else { say "No response from $1, error ($!)"; } }

Comment on Where is LWP::Simple::get error information stored?
Download Code
Re: Where is LWP::Simple::get error information stored?
by BrowserUk (Pope) on Sep 27, 2012 at 21:45 UTC
    how can I programmatically determine errors like this from LWP?

    The LWP::Simple POD for get explains:

    You will not be able to examine the response code or response headers (like 'Content-Type') when you are accessing the web using this function. If you need that information you should use the full OO interface (see the LWP::UserAgent manpage).

    However, there is (kinda) a way around it. If you look at getprint(), it returns an HTTP error code, but of course, you probably want the content in your program, not sent to stdout. So, the trick I've used in the past is to open a ramfile and select it before calling getprint(). That way you get the content into a variable and the status code:

    my $content; open RAM, '>', \$content; my $stdoutsaved = select; select *RAM; my $status = getprint( $url ); select $stdoutsaved; close RAM; ## Now the status is in $status and the content in $content.

    Quite why get() wasn't implement to return both:

    my( $status, $content ) = get( $url );

    I guess we'll never know, but it is unlikely to change now.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

    RIP Neil Armstrong

      Thanks for the explanation and an example work around. For the sake of maintainability though I think I will avoid using this crafty method.

        For the sake of maintainability though I think I will avoid using this crafty method.

        It is not at all clear to me why you think it need be a maintenance problem. Just wrap it over in a function something like:

        sub getWithStatus { my $url = shift; my $content; open my $RAM, '>', \$content; my $stdoutsaved = select; select *$RAM; my $status = getprint( $url ); select $stdoutsaved; close $RAM; return $status, $content; } ... my( $status, $content ) = getWithStatus( $url ); if( $status ne RC_OK ) { die "Get $url failed with HTTP status: $status"; } ...

        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

        RIP Neil Armstrong

        Social/div
Re: Where is LWP::Simple::get error information stored?
by Anonymous Monk on Sep 28, 2012 at 00:06 UTC

    Sadly it isn't stored anywhere, even LWP::UserAgent itself doesn't store that stuff, it forces you to store it

    WWW::Mechanize on the other hand stores the responses so you don't have to, it even (optionally, by default ) throws exceptions

      Thanks for the information. I will look into WWW::Mechanize.

      LWP::UserAgent doesn't?

      use LWP::UserAgent; my $ua = LWP::UserAgent->new; $ua->env_proxy; my $response = $ua->get('http://example.org/'); print "HTTP response code was: ", $response->status_line, "\n"; #print "Content was:\n", $response->content, "\n";
      HTTP response code was: 200 OK HTTP response code was: 404 Not Found HTTP response code was: 500 Can't connect to 127.0.0.1:80 (Invalid arg +ument)

        Of course it doesn't, it forces you to store it

        my $response = $ua->get('http://example.org/');

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://996077]
Approved by BrowserUk
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others rifling through the Monastery: (11)
As of 2014-08-20 18:10 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The best computer themed movie is:











    Results (121 votes), past polls