Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

Different answers for script and browser (LWP)

by Sly_G (Novice)
on Jun 18, 2013 at 18:43 UTC ( #1039627=perlquestion: print w/ replies, xml ) Need Help??
Sly_G has asked for the wisdom of the Perl Monks concerning the following question:

Site that I'm parsing with perl script recently moved to human-readable urls. I'm trying to get redirects from "id" requests to current addresses. For example, when I'm going to "http://www.giftman.ru/show.php?id=294" in browser, site server redirects it to "http://www.giftman.ru/catalog/amulets/"

But when I'm trying to get this moved location in my script, I don't get 301 answer, it returns "200 OK" for some reason.

Script:

use LWP::UserAgent; use HTTP::Cookies; use HTTP::Headers; $ua = LWP::UserAgent->new; $hh = HTTP::Headers->new( User-Agent => 'Mozilla/5.0 (Windows NT 5.1; rv:21.0) Gecko/20100101 +Firefox/21.0', Accept => 'text/html,application/xhtml+xml,application/xml;q=0.9,*/* +;q=0.8', Accept-Language => 'en-us,en;q=0.7,ru;q=0.3', Accept-Encoding => 'gzip, deflate', Connection => 'keep-alive', ); $ua->default_headers( $hh ); $cookie_jar = HTTP::Cookies->new( ); $ua->cookie_jar($cookie_jar); @rename = ( 294 , 9806 , 9807 , ); for $ren (@rename) { $res = $ua->get("http://www.giftman.ru/show.php?id=$ren"); print $res->header('Location')."\n"; }

I used http sniffer to see what's going on with browser, and there's nothing special, really:

GET /show.php?id=294 HTTP/1.1 Host: www.giftman.ru User-Agent: Mozilla/5.0 (Windows NT 5.1; rv:21.0) Gecko/20100101 Firef +ox/21.0 Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0. +8 Accept-Language: en-us,en;q=0.7,ru;q=0.3 Accept-Encoding: gzip, deflate Cookie: __utma=83753984.1287093182.1370328704.1371539232.1371576574.7; + __utmz=83753984.1370328704.1.1.utmcsr=(direct)|utmccn=(direct)|utmcm +d=(none); __utmb=83753984.10.10.1371576574; _ym_visorc=w; PHPSESSID=4 +p0ql1mitskhkbg3os47v1hc11; __utmc=83753984 Connection: keep-alive

By accident I stumbled on this: if I use $ua->get("http://www.giftman.ru/show.php?id=$ren 0"), i.e. space and some symbols after URL string, I'm getting completely different response, and there it is, "301 moved" and new location.

I can't understand what's happening.

Comment on Different answers for script and browser (LWP)
Select or Download Code
Re: Different answers for script and browser (LWP)
by rnewsham (Hermit) on Jun 18, 2013 at 21:39 UTC

    LWP will follow the 301 so you will get the 200 from the new location. The details of the chain followed will be in previous. I have modified your code to get the Location from previous and it should do what you want. I have also added use strict and use warnings, as that is always sensible.

    use strict; use warnings; use LWP::UserAgent; use HTTP::Cookies; use HTTP::Headers; my $ua = LWP::UserAgent->new; my $hh = HTTP::Headers->new( 'User-Agents' => 'Mozilla/5.0 (Windows NT 5.1; rv:21.0) Gecko/201001 +01 +Firefox/21.0', Accept => 'text/html,application/xhtml+xml,application/xml;q=0.9,*/* +;q=0.8', 'Accept-Language' => 'en-us,en;q=0.7,ru;q=0.3', 'Accept-Encoding' => 'gzip, deflate', Connection => 'keep-alive', ); $ua->default_headers( $hh ); my $cookie_jar = HTTP::Cookies->new( ); $ua->cookie_jar($cookie_jar); my @rename = ( 294 , 9806 , 9807 , ); for my $ren (@rename) { my $res = $ua->get("http://www.giftman.ru/show.php?id=$ren"); print $res->previous->header('Location')."\n"; }
    Output /catalog/amulets/ /catalog/amulets/the_cult/ /catalog/amulets/aztek/
      Alternatively, there is $ua->simple_request() which does not redirect.
      Wow, thanks a lot! I wouldn't get to it by myself.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1039627]
Approved by Corion
Front-paged by MidLifeXis
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others romping around the Monastery: (6)
As of 2014-12-26 21:51 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (176 votes), past polls