Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

Request by LWP Useragent refused by the web server but not by others

by epoch4life (Initiate)
on Jun 10, 2014 at 06:48 UTC ( [id://1089387]=perlquestion: print w/replies, xml ) Need Help??

epoch4life has asked for the wisdom of the Perl Monks concerning the following question:

I have in the past successfully used LWP::Useragent to crawl web sites in http or https to download data, but just can't make it work anymore with this new server. I feel this might be due to the server trying to refuse useragents deliberately. I read some post in this thread which suggested to impersonate a Firefox request. I tried my best in this regard all to no avail.

More specifically, I'm trying to use LWP useragent to automate the collection of data from a site, but was always get refused with this message

500 Can't connect to tutorialregistration.uws.edu.au:443 (SSL connect +attempt failed because of handshake problemserror:00000000:lib(0):fun +c(0):reason(0))

I have narrowed it down to accessing just the URL https://tutorialregistration.uws.edu.au/aplus/admin/adminLogin.do which I can directly access from a browser, but failed with the above message when using LWP Useragent.

This can be shown via

perl -MLWP::Simple -e "getprint 'https://tutorialregistration.uws.edu. +au/aplus/admin/adminLogin.do'"

or

use LWP::UserAgent; $ua = new LWP::UserAgent; $req = new HTTP::Request 'GET' => 'https://tutorialregistration.uws.edu.au/aplus/admin/adminLogin.do'; # impersonate a firefox brower $ua->agent("Mozilla/5.0 (Windows NT 6.1; rv:29.0) Gecko/20100101 Firef +ox/29.0"); $req->header( 'Accept' => 'text/html,application/xhtml+xml,application/xml;q=0.9,*/* +;q=0.8', 'Accept-Language' => 'en-US,en;q=0.5', 'Accept-Encoding' => 'gzip, deflate', 'Cookie' => '', 'Referer' => 'https://www.uws.edu.au/', 'Connection' => 'keep-alive', ); $res = $ua->request($req); print "content-type:text/html\n\n"; print $res->content;

In both cases, if I replace the webpage URL by another https page (inside or outside Intranet), they both work fine. I really can't figure out what has gone wrong here. Please help. Many thanks.

David

Replies are listed 'Best First'.
Re: Request by LWP Useragent refused by the web server but not by others
by Khen1950fx (Canon) on Jun 10, 2014 at 07:33 UTC
    Don't forget to use strictures:
    use strict; use warnings;
    You can see what the handshake is doing by using IO::Socket::SSL.
    #!/usr/bin/perl use strict; use warnings; use IO::Socket::SSL qw(debug3); require LWP::UserAgent; require HTTP::Request; my $ua = LWP::UserAgent->new; $ua->agent( "Mozilla/5.0 (Windows NT 6.1; rv:29.0) Gecko/20100101 Firefox/29.0 +"); $ua->timeout(10); $ua->protocols_allowed(['https']); my $req = new HTTP::Request 'GET' => 'https://tutorialregistration.uws.edu.au/aplus/admin/adminLogin.do +'; $req->header( 'Accept' => 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0 +.8', 'Accept-Language' => 'en-US,en;q=0.5', 'Accept-Encoding' => 'gzip, deflate', 'Cookie' => '', 'Referer' => 'https://www.uws.edu.au/', 'Connection' => 'keep-alive', ); my $res = $ua->request($req); print "content-type:text/html\n\n"; print $res->content;
      Many thanks to all the helpers. It looks like there are some PERL configuration problems, because I now realised that the script does work on my laptop's PERL. I'll get my system admin to fix the PERL configuration.
Re: Request by LWP Useragent refused by the web server but not by others (ssl handshake problem)
by Anonymous Monk on Jun 10, 2014 at 07:26 UTC
Re: Request by LWP Useragent refused by the web server but not by others
by sundialsvc4 (Abbot) on Jun 10, 2014 at 14:35 UTC

    Yep, “and the good news is,” a message such as SSL handshake problem tells you definitively that the SSL connection is never successfully getting established ... so, the root cause of the problem is not as you fear.   Your fears in this case are a red-herring.   The server isn’t refusing you.   It never hears from you at all.   Follow the links mentioned above to help resolve the problem.   (And, if you have administrative access to the server that you are attempting to use, always check its server logs.   Your own computer may also log failed connection-attempts and provide useful diagnostic information.)

      Thanks, and it turns out you are absolutely right.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1089387]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others about the Monastery: (3)
As of 2024-04-26 00:09 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found