Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

Cookie protected web page and file downloading

by reTard (Sexton)
on Jun 02, 2005 at 03:10 UTC ( #462732=perlquestion: print w/ replies, xml ) Need Help??
reTard has asked for the wisdom of the Perl Monks concerning the following question:

Hi all
I'm trying to download a file from a web site using a Perl script but it is 'protected' by some a cookie a couple of clicks back.
How should I do this? I've looked (briefly) at LWP but I don't know if thats what I need to do. There is also a proxy in between my workstation and the internet (but it doesn't required authentication).
Thanks
reTard

Comment on Cookie protected web page and file downloading
Re: Cookie protected web page and file downloading
by tlm (Prior) on Jun 02, 2005 at 03:13 UTC

    I find that WWW::Mechanize is pretty good with cookie-ness. Have you tried it?

    the lowliest monk

      I've had a look at Mech but I can't see where to specify the HTTP proxy details?
      thanks
Re: Cookie protected web page and file downloading
by reTard (Sexton) on Jun 03, 2005 at 00:44 UTC
    Hi again
    I've made many of the suggested changes and the script now looks like:
    #!/usr/bin/perl use Data::Dumper; use WWW::Mechanize; my $mech = WWW::Mechanize->new( cookie_jar => {}, agent => "WWW-Mechanize/0.01", protocols_allowed => ['http'], autocheck => 1); $url = 'http://www.ibm.com/servers/eserver/support/pseries/aixfixes.ht +ml'; $mech->proxy('http','172.17.1.248'); $mech->get( $url ); print Dumper $mech; $a=<STDIN>; `clear`; $mech->follow_link( text_regex => qr/More fix services/) or die; print Dumper $mech; $a=<STDIN>; `clear`; $mech->follow_link( text_regex => qr/AIX 5.3/) or die; print Dumper $mech; $a=<STDIN>; `clear`; $mech->follow_link( text_regex => qr/Data file for AIX 5.3/) or die; print Dumper $mech;

    Now I'm getting the following error:
    Error GETing http://www.ibm.com/servers/eserver/support/pseries/aixfixes.html: Access to 'http' URIs has been disabled at aixfixes.pl line 15
    It's dying on the $mech->get( $url ); line.
    If I remove the protocols_allowed => ['http'], bit I get a different error:
    Error GETing http://www.ibm.com/servers/eserver/support/pseries/aixfixes.html: Protocol scheme '' is not supported at lwp.pl line 15.
    I can manually access this page with lynx so it must be something in the script I'm doing wrong. Any more ideas?
    Thanks
      It looks like you're specifying your proxy incorrectly. $mech->proxy() takes a URL as its second argument, not an IP address.

      Try $mech->proxy('http', 'http://172.17.1.248/') and see if that works.

        Yay! Thank you!!

        This almost works now!

        I can see the link to the file I want to download now but I'm now sure how to reference it.

        'last_uri' => 'http://www-912.ibm.com/eserver/support +/fixinfo/download?file=LatestFixData53', 'uri' => 'http://www-912.ibm.com/eserver/support/fixi +nfo/download?file=LatestFixData53',

        How do I save this? I've looked at lwp-download but that only confused me more.
        Thanks

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://462732]
Approved by tlm
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chilling in the Monastery: (9)
As of 2014-09-15 05:50 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite cookbook is:










    Results (145 votes), past polls