Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

Re: GET request using LWP::UserAgent returns 200 OK but Firefox 302 Found

by rizzo (Deacon)
on Mar 10, 2018 at 18:06 UTC ( #1210629=note: print w/replies, xml ) Need Help??


in reply to GET request using LWP::UserAgent returns 200 OK but Firefox 302 Found

The whole exercice consists of: POST, GET, GET.

Are you sure?

Can it be, that you're actually doing a

GET, POST, GET, GET

in Firefox( e.g. visit the side with GET and then POST some form data)
while you're doing a

POST, GET, GET

in LWP?

If this is the case you are probably lacking some cookies in your LWP-version which makes it behave differently.
  • Comment on Re: GET request using LWP::UserAgent returns 200 OK but Firefox 302 Found

Replies are listed 'Best First'.
Re^2: GET request using LWP::UserAgent returns 200 OK but Firefox 302 Found
by bliako (Monsignor) on Mar 11, 2018 at 01:23 UTC

    Good thinking! Indeed there are other GETs before the ones I described (in a previous phase which completes successfully) so the cookie actually is set. thanks.

    I think I am getting closer though but I still have to test it:

    after using

    Wireshark as per 7stud's and haukex's advice

    and

    LWP::ConsoleLogger::Easy

    I realised that LWP::UserAgent could be responding to a '302 Found' automatically and follow the redirect. And that BOTH me (via LWP) and LWP (responding to the 302 automatically) are sending another request to the re-location (however, I am sending one after LWP finished with his). And that messes things up.

    Man page of LWP states that there is a list called 'requests_redirectable' which contains the protocols for which to follow redirects. By default, 'GET' and 'HEAD' are included. POST is not.

    Given also that LWP's 'max_redirect' is 7 by default, it sounds to me that a GET returning with a 302 will cause LWP to follow automatically. But I am also doing that myself in the program having assumed that LWP will not follow redirects (or forgottent that it does).

    In my 'scraping exercise' there is a long list of previous POSTs which return a 302 but this is the first time GET does. The POSTs were not followed on by LWP and all was OK but the GET is (because it is in the 'requests_redirectable' list of LWP) and the problem arose.

    thanks

      I can now say that the problem indeed is that LWP was following redirects (as it should). But also myself was also following redirects by issuing another request via LWP.

      So how I solved it was to set

      $ua->requests_redirectable([]);
      which tells LWP::UserAgent not to follow any redirects for any request.

      (setting

      $ua->requests_redirectable(['GET']);
      would allow only GET to be followed by LWP).

      I have also discovered that there is another problem with allowing UA to follow redirects. In a redirect the server sends a Location header which contains the url of the redirect and issues a 302 status (or 30X something). UA extracts this Location url and issues another request to there.

      The problem lies in the server sometimes sending a relative url back. And UA tries to make it absolute. In my case, UA failed to do that. So even if I allowed UA to follow redirect, it would have failed in sending a malformed url to the server.

      UA has the following code to convert the url:

      my $referral_uri = $response->header('Location'); { # Some servers erroneously return a relative URL for redir +ects, # so make it absolute if it not already is. local $URI::ABS_ALLOW_RELATIVE_SCHEME = 1; my $base = $response->base; $referral_uri = "" unless defined $referral_uri; $referral_uri = $HTTP::URI_CLASS->new($referral_uri, $base)->abs($ba +se); } $referral->uri($referral_uri);

      In my case:

      base='http://server.com/ABC/afilename1?op=678' referral='../../ABC/XYZ/KLM/afilename2?aa=123'
      and the calculated new referral came out as:
      http://server.com/../ABC/XYZ/KLM/afilename2?aa=123

      instead of the correct one of:

      http://server.com/ABC/XYZ/KLM/afilename2?aa=123

      may be this is expected behaviour from URI->abs()?< I will send a bug report just in case.

      Thanks Monks

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1210629]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others wandering the Monastery: (5)
As of 2022-09-27 17:08 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    I prefer my indexes to start at:




    Results (122 votes). Check out past polls.

    Notices?