http://www.perlmonks.org?node_id=436260


in reply to What is the fastest way to download a bunch of web pages?

Just to make things more interesting, I'd suggest you take a look at even based approach, for example, via POE (POE::Component::Client::HTTP) or the like.

But I'd suggest that you keep this in the back of your head, and leave it for future, because it requires that you think about I/O, order of things, blah blah blah.

It was pretty hard for me personally to write a web crawler like that.

But anyway, it *is* possible to increase the performance of fetching websites to about 10K ~ 20K urls/hour using such an approach. And this is with a single process.

  • Comment on Re: What is the fastest way to download a bunch of web pages?

Replies are listed 'Best First'.
Re^2: What is the fastest way to download a bunch of web pages?
by tphyahoo (Vicar) on Mar 03, 2005 at 16:27 UTC
    Sounds promising. Any open code to do this?

      If you're looking to control how many child processes, Parallel::ForkManager may be helpful. The example source specifically demonstrates what I think you're trying to accomplish.