Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

Re: TCP socket and fork

by afoken (Parson)
on Jul 04, 2009 at 11:18 UTC ( #777210=note: print w/ replies, xml ) Need Help??


in reply to TCP socket and fork

Why do you think that 20 threads could get HTTP contents faster over a single socket than over 20 distinct sockets? Only one thread at a time could read and write the socket due to the way HTTP works, so 19 threads would have to wait for the first thread to finish. After that, 18 threads have to work for the second thread. And so on, until the last thread finished. You don't need threads for that, a simple for loop is even faster, because it does not have the threads overhead.

You can accelerate HTTP by using the keepalive feature, but for that, you need an agent that you don't destroy after a single request, like you to when you call the simple get() function.

Update: Is this related to IO::Socket, Multiple GET.?

Alexander

--
Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)


Comment on Re: TCP socket and fork
Re^2: TCP socket and fork
by adismaug (Acolyte) on Jul 04, 2009 at 11:29 UTC
    Dear Alexander,
    Thanks for the replay.
    Off course you are right and I did not think of the issue that the GET will have to be preformed one after the other and not in parallel.
    Time is the most important issue for me and I am using 20 threads so the program can get the all the data in the same time. I wanted to save the time it takes for the TCP handshake and use the resources already allocated by the server.
    Do you have an idea to how I can accelerate the program to the minimum time possible?
    How can I perform the entire GET request at the same time?
    Thanks in advance,
    Adi.

      You have various options, as I explained above.

      • Use a single socket and the HTTP/1.1 keepalive feature, then play request-response ping-pong over that socket. No threads required, just use a single LWP::UserAgent instance for this. This avoids that little bit of TCP handshake, but serialises all your requests.
      • Fork as many threads or processes as you like, and let each process fetch one resource, nearly as you do now, with 20 independent instances of LWP::UserAgent behind the scenes. This costs many TCP handshakes, but allows you to saturate your network connection (or that of the server).
      • Mix both approaches. Create a controlling thread/process that forks several slaves (let's just say four), then gives each slave a new URL to fetch as soon as the slave is idle. Use keepalive in each of the slaves. This uses most of your bandwidth and avoids some TCP handshakes. Note that the number of requests processed by each slave depends entirely on how fast it can hande its job. A slave that has to fetch a gigabyte of data will propably process only one request, while other slaves that get tiny repsonses will process lots of requests.
      • Simplified mix: Create just a bunch of slaves (again, let's assume four slaves), each with a constant fraction of the URL list to be processed (five entries, in this example). This does not balance as well, but requires less code. If one unlucky slave has to process five gigabyte responses, while the other slaves got away with a few kilobytes, you will wait a long time for the last slave.

      Why are you so worried about TCP handshakes? TCP handshake requrires three TCP packages. A simple GET request adds one more package, and the response uses round about one package for the HTTP headers and then two packages for every three KBytes of data. (Assuming we are talking about ethernet, PPP or PPPoE). As soon as your response is larger than a few KBytes, the TCP handshake does not really matter. If you (ab)use HTTP as a way to transport tons of tiny messages in some RPC protocol, TCP handhake really matters.

      Alexander

      --
      Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)
        Dear Alexander,
        My concern is not about the TCP handshake but more on the server resources.
        Lets say I need the get 10 pages from the server, each page takes 1 second to retrieve so theoretically if I use 10 threads it should take one second to get 1 page or 10 pages.
        The problem starts when the server is busy with a lot of requests, it needs to allocate resources for each request and there for some of then take longer the 1 second.
        If my client was to use the same resource allocated by the server (same TCP socket) for all my threads then I will not suffer from the delay because the server already allocated the resources for me.
        On the other hand if each thread opens a new socket then for some threads the server will delay the replay.
        This is a big problem and I cannot find a way to solve it.
        Any ideas?
        Best Regards,
        Adi.
        Dear Alexander,
        My concern is not about the TCP handshake but more on the server resources.
        Lets say I need the get 10 pages from the server, each page takes 1 second to retrieve so theoretically if I use 10 threads it should take one second to get 1 page or 10 pages.
        The problem starts when the server is busy with a lot of requests, it needs to allocate resources for each request and there for some of then take longer the 1 second.
        If my client was to use the same resource allocated by the server (same TCP socket) for all my threads then I will not suffer from the delay because the server already allocated the resources for me.
        On the other hand if each thread opens a new socket then for some threads the server will delay the replay.
        This is a big problem and I cannot find a way to solve it.
        Any ideas?
        Best Regards,
        Adi.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://777210]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others surveying the Monastery: (13)
As of 2014-09-18 14:49 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    How do you remember the number of days in each month?











    Results (116 votes), past polls