http://www.perlmonks.org?node_id=982713

jcabraham has asked for the wisdom of the Perl Monks concerning the following question:

Hi All, I'm trying to script command-line scraping of a website, a vendor website hosted at my company. There are many levels of redirection one must go through after login, and, while Firefox and Chrome can handle it, LWP seems to generate "Bad Request" responses. After the "SetSessionVars.php" request (below), it returns a bad request response, whereas in the browser it successfully redirectos to the home page. For the life of me I can't figure out what I'm not doing. Here's my code:

my $ua = LWP::UserAgent->new(); push @{ $ua->requests_redirectable }, 'POST'; my $cookies = new HTTP::Cookies(file=>'/Users/jcabraham/.cookies.txt', +autosave=>1, ignore_discard=>1); $ua->cookie_jar($cookies); $ua->default_header('Accept-Encoding' => scalar HTTP::Message::decodab +le()); $ua->add_handler("request_send", sub { shift->dump; return }); $ua->add_handler("response_done", sub { shift->dump; return }); # log off first, just start clean my $auth_response = $ua->request(GET "http://ap1492-dsr/LogOff.php"); # now login my $response = $ua->request(POST "http://ap1492-dsr/authenticate.php", + [user => $authUser, password => $authPw, TimezoneOffset => 14400, su +bmit => 'User Login']); # scrape home page $response = $ua->request(GET "http://ap1492-dsr/Welcome.php"); if ($response->is_success) { my $html = $response->decoded_content; print $html; }

And here's the trace output from LWP:

macbook:scripts jcabraham$ link_aperio.pl 12 12 GET http://ap1492-dsr/LogOff.php Accept-Encoding: gzip, x-gzip, deflate, x-bzip2 User-Agent: libwww-perl/5.837 Cookie: PHPSESSID=1342557122; DontShowDisclaimer80=1 Cookie2: $Version="1" (no content) HTTP/1.1 302 Found Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre- +check=0 Connection: close Date: Thu, 19 Jul 2012 20:23:13 GMT Pragma: no-cache Location: Login.php Server: Apache Content-Length: 0 Content-Type: text/html; charset=UTF-8 Expires: Thu, 19 Nov 1981 08:52:00 GMT Client-Date: Thu, 19 Jul 2012 20:23:13 GMT Client-Peer: 10.100.50.80:80 Client-Response-Num: 1 X-Powered-By: PHP/5.3.5 (no content) GET http://ap1492-dsr/Login.php Accept-Encoding: gzip, x-gzip, deflate, x-bzip2 User-Agent: libwww-perl/5.837 Cookie: PHPSESSID=1342557122; DontShowDisclaimer80=1 Cookie2: $Version="1" (no content) HTTP/1.1 200 OK Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre- +check=0 Connection: close Date: Thu, 19 Jul 2012 20:23:13 GMT Pragma: no-cache Server: Apache Content-Length: 5078 Content-Type: text/html; charset=UTF-8 Expires: Thu, 19 Nov 1981 08:52:00 GMT Client-Date: Thu, 19 Jul 2012 20:23:14 GMT Client-Peer: 10.100.50.80:80 Client-Response-Num: 1 Link: <./CSS/masterstyle.css?11.1.1.760>; rel="stylesheet"; type="text +/css" Link: <./CSS/blue.css?11.1.1.760>; rel="stylesheet"; type="text/css" Link: <./CSS/blueLogin.css?11.1.1.760>; rel="stylesheet"; type="text/c +ss" Link: <./CSS/custom.css?11.1.1.760>; rel="stylesheet"; type="text/css" Refresh: text/html Set-Cookie: memory_limit=deleted; expires=Wed, 20-Jul-2011 20:23:12 GM +T; path=/ Set-Cookie: PHPSESSID=1342729393; path=/ Set-Cookie: PHPSESSID=681877b8eaa1b7fd3a35cc9db713cfa7; path=/ Set-Cookie: PHPSESSID=1342557122; path=/; httponly Title: Spectrum - Login X-Powered-By: PHP/5.3.5 \r <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN""http://www.w3.org/TR +/html4/loose.dtd"><html><head><meta content='text/html' http-equiv='r +efresh'> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><TI +TLE>Spectrum - Login</TITLE> <link type='text/css' rel='stylesheet' href='./CSS/masterstyle.css?11. +1.1.760'> <script type='text/javascript' src='./Spectrum.js?11.1.1.760'> </scrip +t> <script type='text/javascript' src='./Keyboard.js?11.1.1.760'> </scrip +t> <script type='text/javascript' src='.... (+ 4566 more bytes not shown) POST http://ap1492-dsr/authenticate.php Accept-Encoding: gzip, x-gzip, deflate, x-bzip2 User-Agent: libwww-perl/5.837 Content-Length: 70 Content-Type: application/x-www-form-urlencoded Cookie: PHPSESSID=1342557122; DontShowDisclaimer80=1 Cookie2: $Version="1" user=jabraham&password=da!syd0g&TimezoneOffset=14400&submit=User+Login HTTP/1.1 302 Found Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre- +check=0 Connection: close Date: Thu, 19 Jul 2012 20:23:14 GMT Pragma: no-cache Location: Disclaimer.php Server: Apache Content-Length: 0 Content-Type: text/html; charset=UTF-8 Expires: Thu, 19 Nov 1981 08:52:00 GMT Client-Date: Thu, 19 Jul 2012 20:23:14 GMT Client-Peer: 10.100.50.80:80 Client-Response-Num: 1 Set-Cookie: PHPSESSID=1342729394; path=/ X-Powered-By: PHP/5.3.5 (no content) GET http://ap1492-dsr/Disclaimer.php Accept-Encoding: gzip, x-gzip, deflate, x-bzip2 User-Agent: libwww-perl/5.837 Cookie: PHPSESSID=1342729394; DontShowDisclaimer80=1 Cookie2: $Version="1" (no content) HTTP/1.1 302 Found Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre- +check=0 Connection: close Date: Thu, 19 Jul 2012 20:23:14 GMT Pragma: no-cache Location: DetermineRole.php Server: Apache Content-Length: 0 Content-Type: text/html; charset=UTF-8 Expires: Thu, 19 Nov 1981 08:52:00 GMT Client-Date: Thu, 19 Jul 2012 20:23:14 GMT Client-Peer: 10.100.50.80:80 Client-Response-Num: 1 X-Powered-By: PHP/5.3.5 (no content) GET http://ap1492-dsr/DetermineRole.php Accept-Encoding: gzip, x-gzip, deflate, x-bzip2 User-Agent: libwww-perl/5.837 Cookie: PHPSESSID=1342729394; DontShowDisclaimer80=1 Cookie2: $Version="1" (no content) HTTP/1.1 302 Found Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre- +check=0 Connection: close Date: Thu, 19 Jul 2012 20:23:14 GMT Pragma: no-cache Location: DetermineHierarchy.php?RoleId=102&HierarchyId=3 Server: Apache Content-Length: 0 Content-Type: text/html; charset=UTF-8 Expires: Thu, 19 Nov 1981 08:52:00 GMT Client-Date: Thu, 19 Jul 2012 20:23:14 GMT Client-Peer: 10.100.50.80:80 Client-Response-Num: 1 X-Powered-By: PHP/5.3.5 (no content) GET http://ap1492-dsr/DetermineHierarchy.php?RoleId=102&HierarchyId=3 Accept-Encoding: gzip, x-gzip, deflate, x-bzip2 User-Agent: libwww-perl/5.837 Cookie: PHPSESSID=1342729394; DontShowDisclaimer80=1 Cookie2: $Version="1" (no content) HTTP/1.1 302 Found Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre- +check=0 Connection: close Date: Thu, 19 Jul 2012 20:23:14 GMT Pragma: no-cache Location: ../SetSessionVars.php?RoleId=102&HierarchyId=3 Server: Apache Content-Length: 0 Content-Type: text/html; charset=UTF-8 Expires: Thu, 19 Nov 1981 08:52:00 GMT Client-Date: Thu, 19 Jul 2012 20:23:14 GMT Client-Peer: 10.100.50.80:80 Client-Response-Num: 1 X-Powered-By: PHP/5.3.5 (no content) GET http://ap1492-dsr/../SetSessionVars.php?RoleId=102&HierarchyId=3 Accept-Encoding: gzip, x-gzip, deflate, x-bzip2 User-Agent: libwww-perl/5.837 Cookie: PHPSESSID=1342729394; DontShowDisclaimer80=1 Cookie2: $Version="1" (no content) HTTP/1.1 400 Bad Request Connection: close Date: Thu, 19 Jul 2012 20:23:15 GMT Server: Apache Content-Length: 286 Content-Type: text/html; charset=iso-8859-1 Client-Date: Thu, 19 Jul 2012 20:23:14 GMT Client-Peer: 10.100.50.80:80 Client-Response-Num: 1 Title: 400 Bad Request <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"> <html><head> <title>400 Bad Request</title> </head><body> <h1>Bad Request</h1> <p>Your browser sent a request that this server could not understand.< +br /> </p> <hr> <address>Apache Server at ap1492-dsr Port 80</address> </body></html>