Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options

Cookie protected web page and file downloading

by reTard (Sexton)
on Jun 02, 2005 at 03:10 UTC ( #462732=perlquestion: print w/replies, xml ) Need Help??

reTard has asked for the wisdom of the Perl Monks concerning the following question:

Hi all
I'm trying to download a file from a web site using a Perl script but it is 'protected' by some a cookie a couple of clicks back.
How should I do this? I've looked (briefly) at LWP but I don't know if thats what I need to do. There is also a proxy in between my workstation and the internet (but it doesn't required authentication).
  • Comment on Cookie protected web page and file downloading

Replies are listed 'Best First'.
Re: Cookie protected web page and file downloading
by tlm (Prior) on Jun 02, 2005 at 03:13 UTC

    I find that WWW::Mechanize is pretty good with cookie-ness. Have you tried it?

    the lowliest monk

      I've had a look at Mech but I can't see where to specify the HTTP proxy details?
Re: Cookie protected web page and file downloading
by reTard (Sexton) on Jun 03, 2005 at 00:44 UTC
    Hi again
    I've made many of the suggested changes and the script now looks like:
    #!/usr/bin/perl use Data::Dumper; use WWW::Mechanize; my $mech = WWW::Mechanize->new( cookie_jar => {}, agent => "WWW-Mechanize/0.01", protocols_allowed => ['http'], autocheck => 1); $url = ' +ml'; $mech->proxy('http',''); $mech->get( $url ); print Dumper $mech; $a=<STDIN>; `clear`; $mech->follow_link( text_regex => qr/More fix services/) or die; print Dumper $mech; $a=<STDIN>; `clear`; $mech->follow_link( text_regex => qr/AIX 5.3/) or die; print Dumper $mech; $a=<STDIN>; `clear`; $mech->follow_link( text_regex => qr/Data file for AIX 5.3/) or die; print Dumper $mech;

    Now I'm getting the following error:
    Error GETing Access to 'http' URIs has been disabled at line 15
    It's dying on the $mech->get( $url ); line.
    If I remove the protocols_allowed => ['http'], bit I get a different error:
    Error GETing Protocol scheme '' is not supported at line 15.
    I can manually access this page with lynx so it must be something in the script I'm doing wrong. Any more ideas?
      It looks like you're specifying your proxy incorrectly. $mech->proxy() takes a URL as its second argument, not an IP address.

      Try $mech->proxy('http', '') and see if that works.

        Yay! Thank you!!

        This almost works now!

        I can see the link to the file I want to download now but I'm now sure how to reference it.

        'last_uri' => ' +/fixinfo/download?file=LatestFixData53', 'uri' => ' +nfo/download?file=LatestFixData53',

        How do I save this? I've looked at lwp-download but that only confused me more.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://462732]
Approved by tlm
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others pondering the Monastery: (8)
As of 2021-05-12 14:00 GMT
Find Nodes?
    Voting Booth?
    Perl 7 will be out ...

    Results (128 votes). Check out past polls.