Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things

Cookie protected web page and file downloading

by reTard (Sexton)
on Jun 02, 2005 at 03:10 UTC ( #462732=perlquestion: print w/ replies, xml ) Need Help??
reTard has asked for the wisdom of the Perl Monks concerning the following question:

Hi all
I'm trying to download a file from a web site using a Perl script but it is 'protected' by some a cookie a couple of clicks back.
How should I do this? I've looked (briefly) at LWP but I don't know if thats what I need to do. There is also a proxy in between my workstation and the internet (but it doesn't required authentication).
  • Comment on Cookie protected web page and file downloading

Replies are listed 'Best First'.
Re: Cookie protected web page and file downloading
by tlm (Prior) on Jun 02, 2005 at 03:13 UTC

    I find that WWW::Mechanize is pretty good with cookie-ness. Have you tried it?

    the lowliest monk

      I've had a look at Mech but I can't see where to specify the HTTP proxy details?
Re: Cookie protected web page and file downloading
by reTard (Sexton) on Jun 03, 2005 at 00:44 UTC
    Hi again
    I've made many of the suggested changes and the script now looks like:
    #!/usr/bin/perl use Data::Dumper; use WWW::Mechanize; my $mech = WWW::Mechanize->new( cookie_jar => {}, agent => "WWW-Mechanize/0.01", protocols_allowed => ['http'], autocheck => 1); $url = ' +ml'; $mech->proxy('http',''); $mech->get( $url ); print Dumper $mech; $a=<STDIN>; `clear`; $mech->follow_link( text_regex => qr/More fix services/) or die; print Dumper $mech; $a=<STDIN>; `clear`; $mech->follow_link( text_regex => qr/AIX 5.3/) or die; print Dumper $mech; $a=<STDIN>; `clear`; $mech->follow_link( text_regex => qr/Data file for AIX 5.3/) or die; print Dumper $mech;

    Now I'm getting the following error:
    Error GETing Access to 'http' URIs has been disabled at line 15
    It's dying on the $mech->get( $url ); line.
    If I remove the protocols_allowed => ['http'], bit I get a different error:
    Error GETing Protocol scheme '' is not supported at line 15.
    I can manually access this page with lynx so it must be something in the script I'm doing wrong. Any more ideas?
      It looks like you're specifying your proxy incorrectly. $mech->proxy() takes a URL as its second argument, not an IP address.

      Try $mech->proxy('http', '') and see if that works.

        Yay! Thank you!!

        This almost works now!

        I can see the link to the file I want to download now but I'm now sure how to reference it.

        'last_uri' => ' +/fixinfo/download?file=LatestFixData53', 'uri' => ' +nfo/download?file=LatestFixData53',

        How do I save this? I've looked at lwp-download but that only confused me more.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://462732]
Approved by tlm
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others wandering the Monastery: (5)
As of 2016-08-28 22:34 GMT
Find Nodes?
    Voting Booth?
    The best thing I ever won in a lottery was:

    Results (396 votes). Check out past polls.