Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

Re: Crawling with Parallel::ForkManager

by fullermd (Priest)
on Aug 07, 2009 at 21:53 UTC ( [id://786944]=note: print w/replies, xml ) Need Help??


in reply to Crawling with Parallel::ForkManager

Just because it's there when you try to load it at a different time, doesn't mean it was really available when the script ran.

Specifically, "Service Temporarily Unavailable" suggests the server refusing your connection because it thinks you've already got enough (too many) connections open to it, which is one of the things to watch out for when you do big parallel fetches. Try reducing the amount of parallelism and see if it happens less often.

  • Comment on Re: Crawling with Parallel::ForkManager

Replies are listed 'Best First'.
Re^2: Crawling with Parallel::ForkManager
by listanand (Sexton) on Aug 07, 2009 at 22:28 UTC
    Thanks for writing. Well I try to access the webpages right after I stop (terminate, in this case) the program and not much later.

    You are right, when I spawn 3 child processes (I have 4 right now), in that case I see much less error messages. But even if I reduce it to 2 parallel connections, I still see error messages !

    I can't think of a way out.

      It really just depends on why the server is giving you the cold shoulder. I went with the most obvious; number of simultaneous connections. If that's the case, dropping to 1 (i.e., not parallel at all) would resolve it. But it may do rate-limiting, shoving you away after a given number of responses in a particular time period. It may be server load dependent. It may just be flat-out random.

      Likely, the only way you can find out for sure what's up is by talking to the server admin. The best solution code-wise is to be adaptive; if you start getting errors, slow down, if you get no errors for a while, speed up. But that's a lot of work to get right.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://786944]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others exploiting the Monastery: (5)
As of 2024-04-18 03:44 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found