Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

Mechanize giving errors for no reason

by Anonymous Monk
on Jun 12, 2013 at 20:58 UTC ( [id://1038587]=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

So, I'm trying to make a web crawler and I'm getting errors from WWW::Mechanize when they're perfectly valid links. I even tried getting the link without putting it in my recursive crawler and it worked. I'm guessing the website is preventing me from getting their pages after a certain search limit is reached like the Google Search API. Do some websites have a limit as to how much you can crawl through their website?

Replies are listed 'Best First'.
Re: Mechanize giving errors for no reason
by space_monk (Chaplain) on Jun 12, 2013 at 21:25 UTC
    Connection could also be rate limited in some way. You'd perhaps do better putting some code up other than have us make random guesses... :-)
    If you spot any bugs in my solutions, it's because I've deliberately left them in as an exercise for the reader! :-)
Re: Mechanize giving errors for no reason
by pemungkah (Priest) on Jun 13, 2013 at 03:27 UTC
    You might also be getting rate limited because your user agent string says you're WWW::Mechanize. Change it the agent argument in new(). But yes, do check if there's a rate limit, and make sure you're conforming to the site's requirements for access.
Re: Mechanize giving errors for no reason
by talexb (Chancellor) on Jun 13, 2013 at 19:39 UTC

    Yes, some sites do watch to see that clients aren't hammering their site -- which is reasonable, when you think about it.

    We'd be able to help you a great deal more if you were more specific about *which* errors you were seeing.

    And .. do you think that Mechanize *randomly* spits out errors? There's almost always a very good reason for errors.

    Alex / talexb / Toronto

    "Groklaw is the open-source mentality applied to legal research" ~ Linus Torvalds

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1038587]
Approved by talexb
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others rifling through the Monastery: (6)
As of 2024-04-19 11:32 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found