Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

Re: Cookie protected web page and file downloading

by reTard (Sexton)
on Jun 03, 2005 at 00:44 UTC ( [id://463071]=note: print w/replies, xml ) Need Help??


in reply to Cookie protected web page and file downloading

Hi again
I've made many of the suggested changes and the script now looks like:
#!/usr/bin/perl use Data::Dumper; use WWW::Mechanize; my $mech = WWW::Mechanize->new( cookie_jar => {}, agent => "WWW-Mechanize/0.01", protocols_allowed => ['http'], autocheck => 1); $url = 'http://www.ibm.com/servers/eserver/support/pseries/aixfixes.ht +ml'; $mech->proxy('http','172.17.1.248'); $mech->get( $url ); print Dumper $mech; $a=<STDIN>; `clear`; $mech->follow_link( text_regex => qr/More fix services/) or die; print Dumper $mech; $a=<STDIN>; `clear`; $mech->follow_link( text_regex => qr/AIX 5.3/) or die; print Dumper $mech; $a=<STDIN>; `clear`; $mech->follow_link( text_regex => qr/Data file for AIX 5.3/) or die; print Dumper $mech;

Now I'm getting the following error:
Error GETing http://www.ibm.com/servers/eserver/support/pseries/aixfixes.html: Access to 'http' URIs has been disabled at aixfixes.pl line 15
It's dying on the $mech->get( $url ); line.
If I remove the protocols_allowed => ['http'], bit I get a different error:
Error GETing http://www.ibm.com/servers/eserver/support/pseries/aixfixes.html: Protocol scheme '' is not supported at lwp.pl line 15.
I can manually access this page with lynx so it must be something in the script I'm doing wrong. Any more ideas?
Thanks

Replies are listed 'Best First'.
Re^2: Cookie protected web page and file downloading
by dave0 (Friar) on Jun 03, 2005 at 01:51 UTC
    It looks like you're specifying your proxy incorrectly. $mech->proxy() takes a URL as its second argument, not an IP address.

    Try $mech->proxy('http', 'http://172.17.1.248/') and see if that works.

      Yay! Thank you!!

      This almost works now!

      I can see the link to the file I want to download now but I'm now sure how to reference it.

      'last_uri' => 'http://www-912.ibm.com/eserver/support +/fixinfo/download?file=LatestFixData53', 'uri' => 'http://www-912.ibm.com/eserver/support/fixi +nfo/download?file=LatestFixData53',

      How do I save this? I've looked at lwp-download but that only confused me more.
      Thanks
        I know it's bad form to be replying to my own posts but this is fixed now thanks to a work mate.

        The final code is:

        use Data::Dumper; use WWW::Mechanize; my $mech = WWW::Mechanize->new( cookie_jar => {}, agent => "WWW-Mechanize/0.01", protocols_allowed => ['http','https'], protocols_forbidden => [undef], autocheck => 1); $url = 'http://www.ibm.com/servers/eserver/support/pseries/aixfixes.ht +ml'; $mech->proxy('http','http://PROXY/'); $mech->get( $url ); #print Dumper $mech; $mech->follow_link( text_regex => qr/More fix services/) or die; #print Dumper $mech; $mech->follow_link( text_regex => qr/AIX 5.3/) or die; #print Dumper $mech; $mech->follow_link( text_regex => qr/Data file for AIX 5.3/) or die; #print Dumper $mech; $mech->get('http://www-912.ibm.com/eserver/support/fixinfo/download?fi +le=LatestFixData53'); print $mech->content;

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://463071]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others avoiding work at the Monastery: (3)
As of 2024-04-19 01:29 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found