Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

Redirect with Mechanize

by cdherold (Monk)
on Sep 30, 2007 at 19:19 UTC ( #641805=perlquestion: print w/ replies, xml ) Need Help??
cdherold has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks,

I am using WWW::Mechanize to follow a redirect within a website (http://www.nature.com/nature/index.html). I would like Mechanize to follow the redirect that occurs on clicking the journal cover (with URL link: http://www.nature.com/nature/current_issue). The redirect shuttles the browser to a URL that includes specifics on the date of the issue, but does not include the base URL address (http://www.nature.com). I am using the following code:

my $agent = WWW::Mechanize->new( autocheck => 1); $agent->get($url); die $agent->response->status_line unless $agent->success; $output = $agent->content; print $output;

The code follows the redirect, however, the redirect URL address is local not global; hence, I end up on a page that is only a partial UTL i.e. "The requested URL /nature/journal/v449/n7161/ was not found on this server."

How do I get Mechanize to fill in the global URL?

Most Humblely,

Chris

Comment on Redirect with Mechanize
Download Code
Re: Redirect with Mechanize
by grinder (Bishop) on Sep 30, 2007 at 20:19 UTC

    I haven't looked closely to figure out the exact nature of their sins, but when I fire up my own copy of WWW::Mechanize, I see that that there is only one link on the page /nature/current/index.html.

    It is therefore a trivial matter to fetch the link path, prefix it with the protocol and host, and fetch the resulting address:

    $mech->get('http://www.nature.com/nature/current/index.html'); my $link = 'http://www.nature.com' . ($mech->links)[0]; $mech->get($link);

    • another intruder with the mooring in the heart of the Perl

Re: Redirect with Mechanize
by perrin (Chancellor) on Sep 30, 2007 at 20:37 UTC
    If Mech isn't filling in the host name correctly, what server is sending you this error message?
Re: Redirect with Mechanize
by hossman (Prior) on Sep 30, 2007 at 22:40 UTC

    http://www.nature.com/nature/current_issue 302s to http://www.nature.com/nature/current_issue/ which 302s to http://www.nature.com/nature/current_issue/index.html

    That URL generates a webpage which says...

    ... <meta http-equiv="Refresh" content="0;url=/nature/journal/v449/n7161/" + /> </head> <body onLoad="location='/nature/journal/v449/n7161/'"> </body> </html>

    ...I'm not sure if WWW::Mechanize follows meta refreshes, but even if it does this particular meta refresh seems a bit bogus -- the url portion of the content attribute needs to be fully qualified.

    But since browsers seem to respect it (it seems to work in firefox even with javascript turned off) WWW::Mechanize could probably be improved to respect it as well.

      If its an automated redirect between the pages (which is quite common) with, which mechanize cannot cope, add this to your code:

      after this :

      $mech->follow_link(n=>4 ); #4 is your link number
      or
      $mech->submit();# or form submit

      add this (to follow the redirection correctly):

      $mech->follow_link(n=>0);



      you can try accessing the root of the page and then following the link.
Re: Redirect with Mechanize
by sanPerl (Friar) on Oct 01, 2007 at 07:26 UTC
    You can either follow grinder 's advice. Or if you are working on windows then you can use Win32::IE::Mechanize module. I guess this module can redirect the page, since it uses IE as user agent.
    I am sorry for providing advice without testing from my end, (which i don't do). I work on Linux and currently facing some problem in accessing Window test machine through my rdesktop.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://641805]
Approved by grinder
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (7)
As of 2014-08-30 19:13 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The best computer themed movie is:











    Results (293 votes), past polls