You have various options, as I explained above.
- Use a single socket and the HTTP/1.1 keepalive feature, then play request-response ping-pong over that socket. No threads required, just use a single LWP::UserAgent instance for this. This avoids that little bit of TCP handshake, but serialises all your requests.
- Fork as many threads or processes as you like, and let each process fetch one resource, nearly as you do now, with 20 independent instances of LWP::UserAgent behind the scenes. This costs many TCP handshakes, but allows you to saturate your network connection (or that of the server).
- Mix both approaches. Create a controlling thread/process that forks several slaves (let's just say four), then gives each slave a new URL to fetch as soon as the slave is idle. Use keepalive in each of the slaves. This uses most of your bandwidth and avoids some TCP handshakes. Note that the number of requests processed by each slave depends entirely on how fast it can hande its job. A slave that has to fetch a gigabyte of data will propably process only one request, while other slaves that get tiny repsonses will process lots of requests.
- Simplified mix: Create just a bunch of slaves (again, let's assume four slaves), each with a constant fraction of the URL list to be processed (five entries, in this example). This does not balance as well, but requires less code. If one unlucky slave has to process five gigabyte responses, while the other slaves got away with a few kilobytes, you will wait a long time for the last slave.
Why are you so worried about TCP handshakes? TCP handshake requrires three TCP packages. A simple GET request adds one more package, and the response uses round about one package for the HTTP headers and then two packages for every three KBytes of data. (Assuming we are talking about ethernet, PPP or PPPoE). As soon as your response is larger than a few KBytes, the TCP handshake does not really matter. If you (ab)use HTTP as a way to transport tons of tiny messages in some RPC protocol, TCP handhake really matters.
Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.
| & || & |
| < || < |
| > || > |
| [ || [ |
| ] || ] ||