Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

Re^3: Cookie protected web page and file downloading

by tlm (Prior)
on Jun 02, 2005 at 03:49 UTC ( #462738=note: print w/ replies, xml ) Need Help??


in reply to Re^2: Cookie protected web page and file downloading
in thread Cookie protected web page and file downloading

WWW::Mechanize is a subclass of LWP::UserAgent, which has methods for setting proxy info; see the docs for the latter.

the lowliest monk


Comment on Re^3: Cookie protected web page and file downloading
Re^4: Cookie protected web page and file downloading
by reTard (Sexton) on Jun 02, 2005 at 05:41 UTC
    So far I have:
    #!/usr/bin/perl use Data::Dumper; use LWP::UserAgent; use WWW::Mechanize; my $mech = WWW::Mechanize->new(cookie_jar => {}, agent => "WWW-Mechani +ze/0.01"); $url = 'http://www.ibm.com/servers/eserver/support/pseries/aixfixes.ht +ml'; my $ua = new LWP::UserAgent; $ua->proxy('http','192.168.1.248'); $mech->get( $url ); $mech->follow_link( text_regex => qr/More fix services/); $mech->follow_link( text_regex => qr/AIX 5.3/); $mech->follow_link( text_regex => qr/Data file for AIX 5.3/); print Dumper $mech;

    But this fails as it is not going through the proxy
    Thanks

    UPDATED the print dump shows

    'status' => 500, 'content' => '500 Can\'t connect to www.ibm.com:80 (B +ad hostname \'www.ibm.com\')

      I would not expect what you have to work. That's because you have created two user agent objects, $ua and $mech (yes the latter is a user agent object too, because WWW::Mechanize is a subclass of LWP::UserAgent), one ($ua) that is configured for use with a proxy (but is otherwise not used), and the other ($mech) that is not configured to use a proxy. I think what you want is something more like this:

      #!/usr/bin/perl use Data::Dumper; use WWW::Mechanize; my $mech = WWW::Mechanize->new(cookie_jar => {}, agent => "WWW-Mechani +ze/0.01"); $url = 'http://www.ibm.com/servers/eserver/support/pseries/aixfixes.ht +ml'; $mech->proxy('http','192.168.1.248'); $mech->get( $url ); $mech->follow_link( text_regex => qr/More fix services/); $mech->follow_link( text_regex => qr/AIX 5.3/); $mech->follow_link( text_regex => qr/Data file for AIX 5.3/); print Dumper $mech;
      Note that what I have done is treat $mech as an LWP::UserAgent object. (If it's not clear what's going on, take a look at perltoot.)

      BTW, you should get into the habit of checking for the success of requests made through the $mech object; you do this with its is_success method.

      the lowliest monk

        Instead of manually checking the success/failure after each step, I found it more convenient to have the Mechanize object die on error. This is very convenient for quick development, later, if you want a robust program, you should disable that feature again.

        ... my $mech = WWW::Mechanize->new( autocheck => 1 ); # cookie jar and user agent are set implicitly

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://462738]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others studying the Monastery: (14)
As of 2014-10-23 14:34 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    For retirement, I am banking on:










    Results (125 votes), past polls