Different answers for script and browser (LWP)

on Jun 18, 2013 at 18:43 UTC
Sly_G has asked for the wisdom of the Perl Monks concerning the following question:

Site that I'm parsing with perl script recently moved to human-readable urls. I'm trying to get redirects from "id" requests to current addresses. For example, when I'm going to "" in browser, site server redirects it to ""

But when I'm trying to get this moved location in my script, I don't get 301 answer, it returns "200 OK" for some reason.


use LWP::UserAgent; use HTTP::Cookies; use HTTP::Headers; $ua = LWP::UserAgent->new; $hh = HTTP::Headers->new( User-Agent => 'Mozilla/5.0 (Windows NT 5.1; rv:21.0) Gecko/20100101 +Firefox/21.0', Accept => 'text/html,application/xhtml+xml,application/xml;q=0.9,*/* +;q=0.8', Accept-Language => 'en-us,en;q=0.7,ru;q=0.3', Accept-Encoding => 'gzip, deflate', Connection => 'keep-alive', ); $ua->default_headers( $hh ); $cookie_jar = HTTP::Cookies->new( ); $ua->cookie_jar($cookie_jar); @rename = ( 294 , 9806 , 9807 , ); for $ren (@rename) { $res = $ua->get("$ren"); print $res->header('Location')."\n"; }

I used http sniffer to see what's going on with browser, and there's nothing special, really:

GET /show.php?id=294 HTTP/1.1 Host: User-Agent: Mozilla/5.0 (Windows NT 5.1; rv:21.0) Gecko/20100101 Firef +ox/21.0 Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0. +8 Accept-Language: en-us,en;q=0.7,ru;q=0.3 Accept-Encoding: gzip, deflate Cookie: __utma=83753984.1287093182.1370328704.1371539232.1371576574.7; + __utmz=83753984.1370328704.1.1.utmcsr=(direct)|utmccn=(direct)|utmcm +d=(none); __utmb=83753984.10.10.1371576574; _ym_visorc=w; PHPSESSID=4 +p0ql1mitskhkbg3os47v1hc11; __utmc=83753984 Connection: keep-alive

By accident I stumbled on this: if I use $ua->get("$ren 0"), i.e. space and some symbols after URL string, I'm getting completely different response, and there it is, "301 moved" and new location.

I can't understand what's happening.

Re: Different answers for script and browser (LWP)
by rnewsham (Chaplain) on Jun 18, 2013 at 21:39 UTC

    LWP will follow the 301 so you will get the 200 from the new location. The details of the chain followed will be in previous. I have modified your code to get the Location from previous and it should do what you want. I have also added use strict and use warnings, as that is always sensible.

    use strict; use warnings; use LWP::UserAgent; use HTTP::Cookies; use HTTP::Headers; my $ua = LWP::UserAgent->new; my $hh = HTTP::Headers->new( 'User-Agents' => 'Mozilla/5.0 (Windows NT 5.1; rv:21.0) Gecko/201001 +01 +Firefox/21.0', Accept => 'text/html,application/xhtml+xml,application/xml;q=0.9,*/* +;q=0.8', 'Accept-Language' => 'en-us,en;q=0.7,ru;q=0.3', 'Accept-Encoding' => 'gzip, deflate', Connection => 'keep-alive', ); $ua->default_headers( $hh ); my $cookie_jar = HTTP::Cookies->new( ); $ua->cookie_jar($cookie_jar); my @rename = ( 294 , 9806 , 9807 , ); for my $ren (@rename) { my $res = $ua->get("$ren"); print $res->previous->header('Location')."\n"; }
    Output /catalog/amulets/ /catalog/amulets/the_cult/ /catalog/amulets/aztek/
      Alternatively, there is $ua->simple_request() which does not redirect.
      Wow, thanks a lot! I wouldn't get to it by myself.

