Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked

Cookie protected web page and file downloading

by reTard (Sexton)
on Jun 02, 2005 at 03:10 UTC ( #462732=perlquestion: print w/replies, xml ) Need Help??
reTard has asked for the wisdom of the Perl Monks concerning the following question:

Hi all
I'm trying to download a file from a web site using a Perl script but it is 'protected' by some a cookie a couple of clicks back.
How should I do this? I've looked (briefly) at LWP but I don't know if thats what I need to do. There is also a proxy in between my workstation and the internet (but it doesn't required authentication).
  • Comment on Cookie protected web page and file downloading

Replies are listed 'Best First'.
Re: Cookie protected web page and file downloading
by tlm (Prior) on Jun 02, 2005 at 03:13 UTC

    I find that WWW::Mechanize is pretty good with cookie-ness. Have you tried it?

    the lowliest monk

      I've had a look at Mech but I can't see where to specify the HTTP proxy details?
Re: Cookie protected web page and file downloading
by reTard (Sexton) on Jun 03, 2005 at 00:44 UTC
    Hi again
    I've made many of the suggested changes and the script now looks like:
    #!/usr/bin/perl use Data::Dumper; use WWW::Mechanize; my $mech = WWW::Mechanize->new( cookie_jar => {}, agent => "WWW-Mechanize/0.01", protocols_allowed => ['http'], autocheck => 1); $url = ' +ml'; $mech->proxy('http',''); $mech->get( $url ); print Dumper $mech; $a=<STDIN>; `clear`; $mech->follow_link( text_regex => qr/More fix services/) or die; print Dumper $mech; $a=<STDIN>; `clear`; $mech->follow_link( text_regex => qr/AIX 5.3/) or die; print Dumper $mech; $a=<STDIN>; `clear`; $mech->follow_link( text_regex => qr/Data file for AIX 5.3/) or die; print Dumper $mech;

    Now I'm getting the following error:
    Error GETing Access to 'http' URIs has been disabled at line 15
    It's dying on the $mech->get( $url ); line.
    If I remove the protocols_allowed => ['http'], bit I get a different error:
    Error GETing Protocol scheme '' is not supported at line 15.
    I can manually access this page with lynx so it must be something in the script I'm doing wrong. Any more ideas?
      It looks like you're specifying your proxy incorrectly. $mech->proxy() takes a URL as its second argument, not an IP address.

      Try $mech->proxy('http', '') and see if that works.

        Yay! Thank you!!

        This almost works now!

        I can see the link to the file I want to download now but I'm now sure how to reference it.

        'last_uri' => ' +/fixinfo/download?file=LatestFixData53', 'uri' => ' +nfo/download?file=LatestFixData53',

        How do I save this? I've looked at lwp-download but that only confused me more.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://462732]
Approved by tlm
[Corion]: Oh yay. The (external, not guided by me) programmers have chosen Moose+DBIx::Class for some implementation, and now seem to do 1+n SELECT statements for each row, as is usual when using ORMs.
[Corion]: So maybe I should investigate how to plug in a cache in front of DBIx::Class so I can do a ->selectall_hashre f and then satisfy the "sub"-selects from that cached single SELECT statement ...

How do I use this? | Other CB clients
Other Users?
Others wandering the Monastery: (7)
As of 2017-09-25 10:59 GMT
Find Nodes?
    Voting Booth?
    During the recent solar eclipse, I:

    Results (279 votes). Check out past polls.