Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

TCP Socket, Forking, Memory exhaustion

by asuter (Initiate)
on Nov 07, 2007 at 10:41 UTC ( #649448=perlquestion: print w/replies, xml ) Need Help??

asuter has asked for the wisdom of the Perl Monks concerning the following question:

Hello everybody

I would like to code a TCP server that has to accept socket connections from more than 2'000 clients. The clients would perform some sort of FTP-like communication, i.e. Request-Reply-Request-Reply-...

The connections would be let established for the whole FTP-like communication, which can last for several hours.

The problem: Currently I am forking the server foreach new accepted client (standard way to allow the parent to listen again immediately). But using this approach, I will duplicate the perl script (server) and thus the memory usage would increase. Say the server script needs 5MB, and let there be 10 concurrent clients. Then the memory usage would be already 50MB (+5MB for the parent). Is there a better approach to minimize the memory usage?

Are there any tutorials and/or good books out there that cover this problem? The communication I am trying to implement is a little bit more tricky than simple FTP. After the connection is established, both, the server and the client, can send requests.

My implementation: In a loop, I use the method can_read(1) from the module Select on the socket to check if there is new data (a request or maybe a reply) to be processed. If there is, I will read the data and invoke a script that handles the data (depending on whether it is a request or a reply). If not, I will search in a specified directory for text-files (created by other scripts) that can be sent through the socket. The content of the text-files is either a reply or a request. Then I will sleep for 1 second. and restart the loop.

Any suggestions how I could improve this complicated communication?

Thanks for your answer(s)
Adrian

Replies are listed 'Best First'.
Re: TCP Socket, Forking, Memory exhaustion
by BrowserUk (Patriarch) on Nov 07, 2007 at 11:03 UTC

    Are you sure that forking a script doubles your memory usage?

    I thought that on most systems running a second copy of a program, perl in this case, would only increase overall memory usage by the size of the read-write data area only. As the readonly data and code memory would be shared by the processes. Ditto for all subsequent copies.

    If you're using something like top or PS to assess the memory usage, it may be telling you porkies. Try starting one copy and then record the total free memory figure. Then start a second copy and record the total free memory figure again. The difference will be your actual per process consumption, which in many cases will be far less that the figure suggested by ps/top.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      it's called copy on write. all the pages used by process are marked this way, whenever one of the process forked try to write to that page a trap is raised and the OS copy the page.

      Oha

        Yes I know. (On Unix that is. They are just "shared memory segments" on Win32 for example).

        But it's not as beneficial as people think for perl processes, because "compiled" Perl code and Perl program data, are both RW memory to the perl executable. And even read references to perl variables can caused write accesses (and therefore COW) to the memory that holds them.

        But the main point of my post is that ps/top will often, if not always, count the shared, readonly memory (the perl executable's code and RO data) against each process that shares it, which may be giving the OP a false impression of how much memory is consumed by his existing forking solution.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
Re: TCP Socket, Forking, Memory exhaustion
by apl (Monsignor) on Nov 07, 2007 at 10:58 UTC
Re: TCP Socket, Forking, Memory exhaustion
by moritz (Cardinal) on Nov 07, 2007 at 10:49 UTC
    You could use threads instead of forked processes, that should decrease memory usage.

    If you don't want to rebuild a complete server infrastructure and you have high demand, maybe you could reuse Apache 2. Apache 2 is a server framework with a http server plugged in (roughly spoken). You could insert your own server plugin there.

    Or maybe you could tell us what you need that server for, maybe there is an easier way that reuses another server application.

      You could use threads instead of forked processes, that should decrease memory usage.

      To follow that advice also requires not using Perl. Because Perl's idea of "threads" actually uses more memory than forking (on systems that support real forking not fork emulated using Perl threads, obviously). Perl threads also use the extra memory less efficiently, greatly increasing the cost of thread creation (and destruction).

      For this situation, switching to Perl threads would have significant disadvantages and no advantages.

      Yes, I realize that you are suggesting using something other than a Perl script as the heart of the server infrastructure. But I felt that your opening sentence required some clarification since it is the completely wrong approach when dealing with a server written in Perl, and that wasn't made clear.

      - tye        

Re: TCP Socket, Forking, Memory exhaustion
by weismat (Friar) on Nov 07, 2007 at 12:21 UTC
    You should try to get Stein's "Network programming with Perl". It is a very good book about this topic. TCP Server implemented with Threads is also covered, even though this chapter uses the old thread model and not the new model. (Chapter 11) I think you should be able to download the source code somewhere.
Re: TCP Socket, Forking, Memory exhaustion
by sgt (Deacon) on Nov 07, 2007 at 15:17 UTC

    First I would recommend having a look at some generic solutions on CPAN, so that you can prototype quickly different approaches: some good ones are Net::Server, IO::Multiplex, IO::SessionSet and IO::SessionData in the lib directory of the perl network programming site (for non-blocking sockets in case you cannot trust your clients which is almost always the case ;))

    Secondly if you maintain lots of open connections a select loop might be the way to go. Multiplexing on a few "select-loop " servers could be used to make the approach more scalable ( a request is sent to one of the children in the pool, and that child does the accept and adds the descriptor to the "select" set).

    Finally studying part of the code of aproxy (a small TCP port forwarder) could be useful too

    btw which platform are you on?

    update Humm then there is the question of your bidirectional protocol. It is easy to do that part wrongly. I would suggest a fork for "each request to a connected client" (you need the socket descriptor of that client) IOW a fork for each server write... with COW schemes this does not have to be so costly. You could even prototype with doing a fork-exec of scripts in a server-request/ directory. It all depends on the average time of the request/answer (and your protocol). You'll need also to monitor the number of processes.

    cheers --stephan

Re: TCP Socket, Forking, Memory exhaustion
by weismat (Friar) on Nov 07, 2007 at 16:37 UTC
    I was browsing through the Stein book as I was curious and it mentions a very high overhead for the object-oriented socket implementation. I am not sure if this is still true. On my Solaris machine one purely sleeping one thread consums about 1 MB memory.

      It probable that the vast majority of that 1MB of memery per dormant thread is being consumed by an wildly over-generous per-thread stack allocation that will likely never be consumed. See Use more threads..

      I've had 3000 active threads running (not doing much; incrementing a counter and displaying it at a particular location on a 200x100 console session, but more than sleeping), all in under 1GB.

      Greedy in C terms, but not bad for 3000 independant interpreters.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://649448]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chilling in the Monastery: (2)
As of 2022-05-18 01:15 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Do you prefer to work remotely?



    Results (68 votes). Check out past polls.

    Notices?