Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

Cookie protected web page and file downloading

by reTard (Sexton)
on Jun 02, 2005 at 03:10 UTC ( #462732=perlquestion: print w/ replies, xml ) Need Help??
reTard has asked for the wisdom of the Perl Monks concerning the following question:

Hi all
I'm trying to download a file from a web site using a Perl script but it is 'protected' by some a cookie a couple of clicks back.
How should I do this? I've looked (briefly) at LWP but I don't know if thats what I need to do. There is also a proxy in between my workstation and the internet (but it doesn't required authentication).
Thanks
reTard

Comment on Cookie protected web page and file downloading
Re: Cookie protected web page and file downloading
by tlm (Prior) on Jun 02, 2005 at 03:13 UTC

    I find that WWW::Mechanize is pretty good with cookie-ness. Have you tried it?

    the lowliest monk

      I've had a look at Mech but I can't see where to specify the HTTP proxy details?
      thanks
Re: Cookie protected web page and file downloading
by reTard (Sexton) on Jun 03, 2005 at 00:44 UTC
    Hi again
    I've made many of the suggested changes and the script now looks like:
    #!/usr/bin/perl use Data::Dumper; use WWW::Mechanize; my $mech = WWW::Mechanize->new( cookie_jar => {}, agent => "WWW-Mechanize/0.01", protocols_allowed => ['http'], autocheck => 1); $url = 'http://www.ibm.com/servers/eserver/support/pseries/aixfixes.ht +ml'; $mech->proxy('http','172.17.1.248'); $mech->get( $url ); print Dumper $mech; $a=<STDIN>; `clear`; $mech->follow_link( text_regex => qr/More fix services/) or die; print Dumper $mech; $a=<STDIN>; `clear`; $mech->follow_link( text_regex => qr/AIX 5.3/) or die; print Dumper $mech; $a=<STDIN>; `clear`; $mech->follow_link( text_regex => qr/Data file for AIX 5.3/) or die; print Dumper $mech;

    Now I'm getting the following error:
    Error GETing http://www.ibm.com/servers/eserver/support/pseries/aixfixes.html: Access to 'http' URIs has been disabled at aixfixes.pl line 15
    It's dying on the $mech->get( $url ); line.
    If I remove the protocols_allowed => ['http'], bit I get a different error:
    Error GETing http://www.ibm.com/servers/eserver/support/pseries/aixfixes.html: Protocol scheme '' is not supported at lwp.pl line 15.
    I can manually access this page with lynx so it must be something in the script I'm doing wrong. Any more ideas?
    Thanks
      It looks like you're specifying your proxy incorrectly. $mech->proxy() takes a URL as its second argument, not an IP address.

      Try $mech->proxy('http', 'http://172.17.1.248/') and see if that works.

        Yay! Thank you!!

        This almost works now!

        I can see the link to the file I want to download now but I'm now sure how to reference it.

        'last_uri' => 'http://www-912.ibm.com/eserver/support +/fixinfo/download?file=LatestFixData53', 'uri' => 'http://www-912.ibm.com/eserver/support/fixi +nfo/download?file=LatestFixData53',

        How do I save this? I've looked at lwp-download but that only confused me more.
        Thanks

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://462732]
Approved by tlm
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others lurking in the Monastery: (13)
As of 2014-12-27 16:41 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (177 votes), past polls