Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

Re: What Tools Do You Use With WWW::Mechanize

by OfficeLinebacker (Chaplain)
on Oct 03, 2011 at 22:43 UTC ( #929425=note: print w/ replies, xml ) Need Help??


in reply to What Tools Do You Use With WWW::Mechanize

I'll tell you a technique I probably *should* be using with Mech, which is to either locally cache content or introduce some kind of random delays between loading a page and clicking on a link in it because sometimes servers don't like bots accessing their sites. Also, I copy the UA string verbatim from my browser which works, rather than trying to figure out exactly what parts of it are what I need.


I like computer programming because it's like Legos for the mind.


Comment on Re: What Tools Do You Use With WWW::Mechanize
Re^2: What Tools Do You Use With WWW::Mechanize
by OfficeLinebacker (Chaplain) on Oct 04, 2011 at 18:27 UTC
    Greetings, esteemed monks!

    To reply to my own concern, I came up with this for generating wait times between link clicking/back() calls in a Mech script. What do you think?

    #!/usr/bin/perl -- use strict; use warnings; my $i1 = int(rand(5)+1); my $i2 = int(rand(2)); my $i = 0; while ($i<10){ print "$i1: $i2\n"; my $interval = $i1 + ($i2*$i1); print "waiting for $interval seconds...\n"; sleep($interval); $i1 = int(rand(5)+1); $i2 = int(rand(2)); $i++; }
    Sample output:
    1: 0 waiting for 1 seconds... 4: 0 waiting for 4 seconds... 1: 1 waiting for 2 seconds... 4: 1 waiting for 8 seconds... 4: 0 waiting for 4 seconds... 4: 1 waiting for 8 seconds... 4: 0 waiting for 4 seconds... 5: 0 waiting for 5 seconds... 1: 1 waiting for 2 seconds... 1: 1 waiting for 2 seconds...

    I like computer programming because it's like Legos for the mind.
      OfficeLinebacker,
      Have you seen WWW::Mechanize::Sleepy? Personally, I use something along the lines of:
      # Sleep a random interval between $duration and 2 * $duration - 1 unit sub rest { my ($duration) = @_; sleep $duration; sleep rand($duration); } sub fetch_page { my ($mech, $action, $target, $max, $duration) = @_; for (1 .. $max) { rest($duration); eval {$mech->$action($target);}; return if ! $@ && $mech->status == OK; } die "Failed to fetch '$url' after '$max' attempts\n"; }

      Of course, if you want to allow for HTTP redirects then you will need to change status == OK to include acceptable HTTP codes. Additionally, if you use Time::HiRes to overload sleep, you can easily sleep for partial minutes. In truth, I typically use milliseconds.

      Cheers - L~R

        I hadn't even heard of WWW::Mechanize::Sleepy! That's great. During debugging I get impatient but once the product is rolled out it only has to run once a day or so, so using seconds is fine. Also I think you mean "you can easily sleep for partial seconds," right?

        I like computer programming because it's like Legos for the mind.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://929425]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (20)
As of 2015-07-01 19:59 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (19 votes), past polls