Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

LWP::Useragent doesn't work on certain HTTPS websites?

by sectokia (Pilgrim)
on Feb 14, 2019 at 11:37 UTC ( [id://1229906]=perlquestion: print w/replies, xml ) Need Help??

sectokia has asked for the wisdom of the Perl Monks concerning the following question:

Hi wise monks, I have found that simple LWP get on HTTPS doesn't always work. Below is example on a server which it never seems to work on:

use warnings; use strict; use LWP::UserAgent; my $ua = LWP::UserAgent->new( ssl_opts => { verify_hostname => 0 } ); my $res = $ua->get('https://www.target.com.au/'); print $res->content;

The error I get is:

Status read failed: Connection reset by peer at /usr/share/perl5/Net/H +TTP/Methods.pm line 282.
I had a look at that file and its over my head. Any ideas? Thanks!

Replies are listed 'Best First'.
Re: LWP::Useragent doesn't work on certain HTTPS websites?
by bliako (Monsignor) on Feb 14, 2019 at 12:17 UTC

    try with debug-mode on:

    LWP::ConsoleLogger::Easy::debug_ua($myUA, 6); $IO::Socket::SSL::DEBUG = 3; # SSL debuggging mode

    Long shot: also, ssl_opts supports SSL_version, e.g. SSL_version => 'TLSv1' Maybe some incompatibility of versions that you/them can't handle?

    But before that make sure your useragent string is set to something sensible and not 'www-perl'... And also set your LWP timeout to something long (seconds), $myUA->timeout(100);, oh and that you upgrade your modules if you can.

    I have also been troubled with 500 Server closed connection without sending any data back. I have come to an unfounded conclusion: maybe it has nothing to do with SSL because I can see the handshakes, the sockets opening etc. So, just maybe that's their way to tell me to bugger off.

    Btw, if in your headers you see 'Internal response', that's LWP issuing that and not the server. (I mean the wording of the error does not come from the Server - obviously if it's a timeout).

    bw, bliako

Re: LWP::Useragent doesn't work on certain HTTPS websites?
by noxxi (Pilgrim) on Feb 15, 2019 at 06:37 UTC
    This is a common problem for sites which use the bot manager offered by Akamai CDN to protect against being crawled by bots:
    $ dig www.target.com.au
    ...
    www.target.com.au.      6886    IN      CNAME   shop.target.com.au.edgekey.net.
    shop.target.com.au.edgekey.net. 6886 IN CNAME   e1380.x.akamaiedge.net.
    
    The bot manager detects bots depending on specific traits. Currently the seems to be that Accept-Encoding and Accept-Language are set, that the User-Agent is something like Mozilla/5.0 and that Connection is Keep-Alive. If these conditions are not met the client is treated as a bot, which might result in hanging or error messages. The Connection header is automatically set by LWP to the expected value but the others need to be set explicitly:
    use warnings;
    use strict;
    use LWP::UserAgent;
    
    my $ua = LWP::UserAgent->new();
    my $res = $ua->get('https://www.target.com.au/',
        'Accept-Language' => 'en-US',
        'User-Agent' => 'Mozilla/5.0',
        'Accept-Encoding' => 'identity',
    );
    print $res->content;
    
    See also Golang Http Get Request very slow, Strange CURL issue with a particular website SSL certificate, Scraping attempts getting 403 error or Requests SSL connection timeout over at stackoverflow.com for similar problems.
      Thanks... I figured it had to be headers since it worked in a web-browser. I actually tried to copy the headers that were sent by chrome when the page was fetched, but I missed accept-language...

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1229906]
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others exploiting the Monastery: (8)
As of 2024-03-28 18:50 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found