Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

Re: What Tools Do You Use With WWW::Mechanize

by OfficeLinebacker (Chaplain)
on Oct 03, 2011 at 22:43 UTC ( #929425=note: print w/ replies, xml ) Need Help??


in reply to What Tools Do You Use With WWW::Mechanize

I'll tell you a technique I probably *should* be using with Mech, which is to either locally cache content or introduce some kind of random delays between loading a page and clicking on a link in it because sometimes servers don't like bots accessing their sites. Also, I copy the UA string verbatim from my browser which works, rather than trying to figure out exactly what parts of it are what I need.


I like computer programming because it's like Legos for the mind.


Comment on Re: What Tools Do You Use With WWW::Mechanize
Re^2: What Tools Do You Use With WWW::Mechanize
by OfficeLinebacker (Chaplain) on Oct 04, 2011 at 18:27 UTC
    Greetings, esteemed monks!

    To reply to my own concern, I came up with this for generating wait times between link clicking/back() calls in a Mech script. What do you think?

    #!/usr/bin/perl -- use strict; use warnings; my $i1 = int(rand(5)+1); my $i2 = int(rand(2)); my $i = 0; while ($i<10){ print "$i1: $i2\n"; my $interval = $i1 + ($i2*$i1); print "waiting for $interval seconds...\n"; sleep($interval); $i1 = int(rand(5)+1); $i2 = int(rand(2)); $i++; }
    Sample output:
    1: 0 waiting for 1 seconds... 4: 0 waiting for 4 seconds... 1: 1 waiting for 2 seconds... 4: 1 waiting for 8 seconds... 4: 0 waiting for 4 seconds... 4: 1 waiting for 8 seconds... 4: 0 waiting for 4 seconds... 5: 0 waiting for 5 seconds... 1: 1 waiting for 2 seconds... 1: 1 waiting for 2 seconds...

    I like computer programming because it's like Legos for the mind.
      OfficeLinebacker,
      Have you seen WWW::Mechanize::Sleepy? Personally, I use something along the lines of:
      # Sleep a random interval between $duration and 2 * $duration - 1 unit sub rest { my ($duration) = @_; sleep $duration; sleep rand($duration); } sub fetch_page { my ($mech, $action, $target, $max, $duration) = @_; for (1 .. $max) { rest($duration); eval {$mech->$action($target);}; return if ! $@ && $mech->status == OK; } die "Failed to fetch '$url' after '$max' attempts\n"; }

      Of course, if you want to allow for HTTP redirects then you will need to change status == OK to include acceptable HTTP codes. Additionally, if you use Time::HiRes to overload sleep, you can easily sleep for partial minutes. In truth, I typically use milliseconds.

      Cheers - L~R

        I hadn't even heard of WWW::Mechanize::Sleepy! That's great. During debugging I get impatient but once the product is rolled out it only has to run once a day or so, so using seconds is fine. Also I think you mean "you can easily sleep for partial seconds," right?

        I like computer programming because it's like Legos for the mind.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://929425]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others studying the Monastery: (9)
As of 2014-10-31 16:39 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    For retirement, I am banking on:










    Results (221 votes), past polls