Beefy Boxes and Bandwidth Generously Provided by pair Networks vroom
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

Different answers for script and browser (LWP)

by Sly_G (Novice)
on Jun 18, 2013 at 18:43 UTC ( #1039627=perlquestion: print w/ replies, xml ) Need Help??
Sly_G has asked for the wisdom of the Perl Monks concerning the following question:

Site that I'm parsing with perl script recently moved to human-readable urls. I'm trying to get redirects from "id" requests to current addresses. For example, when I'm going to "http://www.giftman.ru/show.php?id=294" in browser, site server redirects it to "http://www.giftman.ru/catalog/amulets/"

But when I'm trying to get this moved location in my script, I don't get 301 answer, it returns "200 OK" for some reason.

Script:

use LWP::UserAgent; use HTTP::Cookies; use HTTP::Headers; $ua = LWP::UserAgent->new; $hh = HTTP::Headers->new( User-Agent => 'Mozilla/5.0 (Windows NT 5.1; rv:21.0) Gecko/20100101 +Firefox/21.0', Accept => 'text/html,application/xhtml+xml,application/xml;q=0.9,*/* +;q=0.8', Accept-Language => 'en-us,en;q=0.7,ru;q=0.3', Accept-Encoding => 'gzip, deflate', Connection => 'keep-alive', ); $ua->default_headers( $hh ); $cookie_jar = HTTP::Cookies->new( ); $ua->cookie_jar($cookie_jar); @rename = ( 294 , 9806 , 9807 , ); for $ren (@rename) { $res = $ua->get("http://www.giftman.ru/show.php?id=$ren"); print $res->header('Location')."\n"; }

I used http sniffer to see what's going on with browser, and there's nothing special, really:

GET /show.php?id=294 HTTP/1.1 Host: www.giftman.ru User-Agent: Mozilla/5.0 (Windows NT 5.1; rv:21.0) Gecko/20100101 Firef +ox/21.0 Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0. +8 Accept-Language: en-us,en;q=0.7,ru;q=0.3 Accept-Encoding: gzip, deflate Cookie: __utma=83753984.1287093182.1370328704.1371539232.1371576574.7; + __utmz=83753984.1370328704.1.1.utmcsr=(direct)|utmccn=(direct)|utmcm +d=(none); __utmb=83753984.10.10.1371576574; _ym_visorc=w; PHPSESSID=4 +p0ql1mitskhkbg3os47v1hc11; __utmc=83753984 Connection: keep-alive

By accident I stumbled on this: if I use $ua->get("http://www.giftman.ru/show.php?id=$ren 0"), i.e. space and some symbols after URL string, I'm getting completely different response, and there it is, "301 moved" and new location.

I can't understand what's happening.

Comment on Different answers for script and browser (LWP)
Select or Download Code
Re: Different answers for script and browser (LWP)
by rnewsham (Friar) on Jun 18, 2013 at 21:39 UTC

    LWP will follow the 301 so you will get the 200 from the new location. The details of the chain followed will be in previous. I have modified your code to get the Location from previous and it should do what you want. I have also added use strict and use warnings, as that is always sensible.

    use strict; use warnings; use LWP::UserAgent; use HTTP::Cookies; use HTTP::Headers; my $ua = LWP::UserAgent->new; my $hh = HTTP::Headers->new( 'User-Agents' => 'Mozilla/5.0 (Windows NT 5.1; rv:21.0) Gecko/201001 +01 +Firefox/21.0', Accept => 'text/html,application/xhtml+xml,application/xml;q=0.9,*/* +;q=0.8', 'Accept-Language' => 'en-us,en;q=0.7,ru;q=0.3', 'Accept-Encoding' => 'gzip, deflate', Connection => 'keep-alive', ); $ua->default_headers( $hh ); my $cookie_jar = HTTP::Cookies->new( ); $ua->cookie_jar($cookie_jar); my @rename = ( 294 , 9806 , 9807 , ); for my $ren (@rename) { my $res = $ua->get("http://www.giftman.ru/show.php?id=$ren"); print $res->previous->header('Location')."\n"; }
    Output /catalog/amulets/ /catalog/amulets/the_cult/ /catalog/amulets/aztek/
      Alternatively, there is $ua->simple_request() which does not redirect.
      Wow, thanks a lot! I wouldn't get to it by myself.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1039627]
Approved by Corion
Front-paged by MidLifeXis
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chilling in the Monastery: (14)
As of 2014-04-17 14:10 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    April first is:







    Results (449 votes), past polls