Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

Mechanize giving errors for no reason

by Anonymous Monk
on Jun 12, 2013 at 20:58 UTC ( #1038587=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

So, I'm trying to make a web crawler and I'm getting errors from WWW::Mechanize when they're perfectly valid links. I even tried getting the link without putting it in my recursive crawler and it worked. I'm guessing the website is preventing me from getting their pages after a certain search limit is reached like the Google Search API. Do some websites have a limit as to how much you can crawl through their website?

Replies are listed 'Best First'.
Re: Mechanize giving errors for no reason
by space_monk (Chaplain) on Jun 12, 2013 at 21:25 UTC
    Connection could also be rate limited in some way. You'd perhaps do better putting some code up other than have us make random guesses... :-)
    If you spot any bugs in my solutions, it's because I've deliberately left them in as an exercise for the reader! :-)
Re: Mechanize giving errors for no reason
by pemungkah (Priest) on Jun 13, 2013 at 03:27 UTC
    You might also be getting rate limited because your user agent string says you're WWW::Mechanize. Change it the agent argument in new(). But yes, do check if there's a rate limit, and make sure you're conforming to the site's requirements for access.
Re: Mechanize giving errors for no reason
by talexb (Chancellor) on Jun 13, 2013 at 19:39 UTC

    Yes, some sites do watch to see that clients aren't hammering their site -- which is reasonable, when you think about it.

    We'd be able to help you a great deal more if you were more specific about *which* errors you were seeing.

    And .. do you think that Mechanize *randomly* spits out errors? There's almost always a very good reason for errors.

    Alex / talexb / Toronto

    "Groklaw is the open-source mentality applied to legal research" ~ Linus Torvalds

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1038587]
Approved by talexb
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others rifling through the Monastery: (3)
As of 2022-06-26 10:21 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    My most frequent journeys are powered by:









    Results (85 votes). Check out past polls.

    Notices?