http://www.perlmonks.org?node_id=436190


in reply to Re^2: What is the fastest way to download a bunch of web pages?
in thread What is the fastest way to download a bunch of web pages?

If you want to get a solid advice just based on a few raw specs, hire a consultant. There are many consultants who want to make a quick buck by giving advice based on just the numbers. You're mistaken if you think that there's a table that say that for those specs, this and that is the best algorithm.

As for why disk I/O matters, well, I'm assuming you want to store your results, and you're downloading a significant amount of data, enough to not be able to keep in all in memory. So, you have to write to disk. Which means that it's a potential bottleneck (if all the servers you download from are on your local LAN, you could easily get more data per second over the network than your disk can write - depending of course on the disk(s) and the network).

Of course, if all you care about is downloading a handful of pages, each from a different server, in a reasonable short time, perhaps something as simple as:

system "wget $_ &" for @urls;
will be good enough. But that doesn't work well if you need to download 10,000 documents, all from the same server.