Using LWP::Simple to read a redirected page

by MorayJ (Beadle)
on Nov 13, 2012 at 17:43 UTC
MorayJ has asked for the wisdom of the Perl Monks concerning the following question:


The UK government has changed its website and I'm trying to check up on links that I have to see if they still work on the new structure

I'm using LWP::Simple for this

If I put in the web address, it very kindly returns when I use $request->uri ($request being found with:

my $browser = LWP::UserAgent->new; my $response = $browser->get( $url ); my $request = $response->request();

This is where the site now sends you if you go to that url

Difficulty is encountered with other links, like which takes you to if you use a browser, but which $request->uri returns the original url I put in.

What are they doing differently? What do I need to do differently? I guess it's probably more of a web question that just about perl.

Thanks for any advice


Replies are listed 'Best First'.
Re: Using LWP::Simple to read a redirected page
by zentara (Archbishop) on Nov 13, 2012 at 20:12 UTC
      The request object contains the url you've been redirected to

      Maybe I'm missing a subtlety, but this appears to be saying that my uri taken from the request should be the final url. But it doesn't reflect what I see for the final url in Chrome.

Re: Using LWP::Simple to read a redirected page
by Anonymous Monk on Nov 13, 2012 at 22:34 UTC


    OK, long story short...I think the url should have http in front of it and LWP just works with what it's got and doesn't complain.

    I tried again using WWW::Mechanize and it demanded an absolute url. I put in http - it then said it couldn't deal with https, and instructed me to install LWP-Protocol-https.

    I went back to LWP and fed it the absolute url, and it resolved properly giving me the forwarded url as it ought. Out of interest I removed LWP-Protocol-https and that didn't seem to bother it.

    Thanks for the help


Re: Using LWP::Simple to read a redirected page
by Jukari (Initiate) on Nov 13, 2012 at 19:36 UTC
    Might be DNS related... have you tried using the IP addresses directly?

      I haven't. But I'll see if I can work that out, and see if it makes a difference.

      Thanks for the suggestion

