Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

Using LWP::Simple to read a redirected page

by MorayJ (Sexton)
on Nov 13, 2012 at 17:43 UTC ( #1003678=perlquestion: print w/ replies, xml ) Need Help??
MorayJ has asked for the wisdom of the Perl Monks concerning the following question:

Hi

The UK government has changed its website and I'm trying to check up on links that I have to see if they still work on the new structure

I'm using LWP::Simple for this

If I put in the web address https://www.insolvencydirect.gov.uk/isolv, it very kindly returns http://www.bis.gov.uk/insolvency when I use $request->uri ($request being found with:

my $browser = LWP::UserAgent->new; my $response = $browser->get( $url ); my $request = $response->request();
)

This is where the site now sends you if you go to that url

Difficulty is encountered with other links, like www.direct.gov.uk/en/Motoring/OwningAVehicle/TaxationClasses/DG_4022042 which takes you to https://www.gov.uk/vehicle-exempt-from-car-tax if you use a browser, but which $request->uri returns the original url I put in.

What are they doing differently? What do I need to do differently? I guess it's probably more of a web question that just about perl.

Thanks for any advice

MorayJ

Comment on Using LWP::Simple to read a redirected page
Select or Download Code
Re: Using LWP::Simple to read a redirected page
by Jukari (Initiate) on Nov 13, 2012 at 19:36 UTC
    Might be DNS related... have you tried using the IP addresses directly?

      I haven't. But I'll see if I can work that out, and see if it makes a difference.

      Thanks for the suggestion

Re: Using LWP::Simple to read a redirected page
by zentara (Archbishop) on Nov 13, 2012 at 20:12 UTC
      The request object contains the url you've been redirected to

      Maybe I'm missing a subtlety, but this appears to be saying that my uri taken from the request should be the final url. But it doesn't reflect what I see for the final url in Chrome.

Re: Using LWP::Simple to read a redirected page
by Anonymous Monk on Nov 13, 2012 at 22:34 UTC

    Hi

    OK, long story short...I think the url should have http in front of it and LWP just works with what it's got and doesn't complain.

    I tried again using WWW::Mechanize and it demanded an absolute url. I put in http - it then said it couldn't deal with https, and instructed me to install LWP-Protocol-https.

    I went back to LWP and fed it the absolute url, and it resolved properly giving me the forwarded url as it ought. Out of interest I removed LWP-Protocol-https and that didn't seem to bother it.

    Thanks for the help

    MorayJ

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1003678]
Front-paged by Arunbear
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others imbibing at the Monastery: (4)
As of 2015-07-04 21:28 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (60 votes), past polls