Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

Re^3: Cookie protected web page and file downloading

by tlm (Prior)
on Jun 02, 2005 at 03:49 UTC ( #462738=note: print w/ replies, xml ) Need Help??


in reply to Re^2: Cookie protected web page and file downloading
in thread Cookie protected web page and file downloading

WWW::Mechanize is a subclass of LWP::UserAgent, which has methods for setting proxy info; see the docs for the latter.

the lowliest monk


Comment on Re^3: Cookie protected web page and file downloading
Replies are listed 'Best First'.
Re^4: Cookie protected web page and file downloading
by reTard (Sexton) on Jun 02, 2005 at 05:41 UTC
    So far I have:
    #!/usr/bin/perl use Data::Dumper; use LWP::UserAgent; use WWW::Mechanize; my $mech = WWW::Mechanize->new(cookie_jar => {}, agent => "WWW-Mechani +ze/0.01"); $url = 'http://www.ibm.com/servers/eserver/support/pseries/aixfixes.ht +ml'; my $ua = new LWP::UserAgent; $ua->proxy('http','192.168.1.248'); $mech->get( $url ); $mech->follow_link( text_regex => qr/More fix services/); $mech->follow_link( text_regex => qr/AIX 5.3/); $mech->follow_link( text_regex => qr/Data file for AIX 5.3/); print Dumper $mech;

    But this fails as it is not going through the proxy
    Thanks

    UPDATED the print dump shows

    'status' => 500, 'content' => '500 Can\'t connect to www.ibm.com:80 (B +ad hostname \'www.ibm.com\')

      I would not expect what you have to work. That's because you have created two user agent objects, $ua and $mech (yes the latter is a user agent object too, because WWW::Mechanize is a subclass of LWP::UserAgent), one ($ua) that is configured for use with a proxy (but is otherwise not used), and the other ($mech) that is not configured to use a proxy. I think what you want is something more like this:

      #!/usr/bin/perl use Data::Dumper; use WWW::Mechanize; my $mech = WWW::Mechanize->new(cookie_jar => {}, agent => "WWW-Mechani +ze/0.01"); $url = 'http://www.ibm.com/servers/eserver/support/pseries/aixfixes.ht +ml'; $mech->proxy('http','192.168.1.248'); $mech->get( $url ); $mech->follow_link( text_regex => qr/More fix services/); $mech->follow_link( text_regex => qr/AIX 5.3/); $mech->follow_link( text_regex => qr/Data file for AIX 5.3/); print Dumper $mech;
      Note that what I have done is treat $mech as an LWP::UserAgent object. (If it's not clear what's going on, take a look at perltoot.)

      BTW, you should get into the habit of checking for the success of requests made through the $mech object; you do this with its is_success method.

      the lowliest monk

        Instead of manually checking the success/failure after each step, I found it more convenient to have the Mechanize object die on error. This is very convenient for quick development, later, if you want a robust program, you should disable that feature again.

        ... my $mech = WWW::Mechanize->new( autocheck => 1 ); # cookie jar and user agent are set implicitly

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://462738]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others taking refuge in the Monastery: (4)
As of 2015-07-28 05:11 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (252 votes), past polls