TCP Server hangs with multiple client connections

rmahin has asked for the wisdom of the Perl Monks concerning the following question:

Long time reader, first time poster. We have just been completely stumped on this problem for a while now so thought the geniuses over here might be able to shed some light on the matter. So here is our problem. We have a server that needs to accept connections from possibly hundreds of clients, all of whom can issue commands with extremely long output. The program hangs when it gets too many connections at a time, or...for some other reason, that we cannot identify.

Ill give you an overview of how our program works, forgive me if some explanation is left unclear I am trying to break it down into code segments that will be useful. Essentially we have a server accepting connections from multiple clients which will be reading and writing from a SQLITE database (we may change to different database if it becomes a problem) and each client issues commands which the server then kicks off to a client thread, which is passed to a worker thread.

Here is the general flow of our code. Will post segments as needed, but my job does not particularly like us sharing lots of code, and we have written several modules.

We have our main server script. It starts by initializing 2 thread pool objects (which add/remove/reuse threads as needed) using the subroutine we specify. For instance:

$ref->{poolObj} = $clientPool;
exit 1 if ($clientPool->startPool(baseThreads => 10, moduleName => "SE
+RVER::Main", subName => "clientThreadPool", values => $ref));
[download]

$ref used to have more values before, so we have left it as is for now.

One threadpool is for client connections and processing commands, and one is for doing the real work so the client can issue more commands while it is running. As the server receives new connections, each client connection is put into a thread queue with 2 values: it's socket as a file descriptor, and its ip address. This is shown in the code below.

# Open Server Socket
my $sock = IO::Socket::INET->new(Proto => 'tcp',
                                    LocalAddr => $servername,
                                    LocalPort => $port,
                                    Listen => SOMAXCONN,
                                    Reuse => 1,
                                    Blocking => 0);

our $selSock = IO::Select->new($sock);

#some more code is here, like initializing our db connection, and log4
+perl logger 

while(my @ready = $selSock->can_read) {
        foreach $_ (@ready) {
        if ($_ == $sock) {
            $logger->info("Base socket is " . $sock);
            $logger->info("Adding new socket");
            #create a new socket
            my $newSock = $sock->accept or die;
            $logger->info("Added new socket " . $newSock);
            $selSock->add($newSock);
        
            my $fd = fileno $newSock;
            my $address = inet_ntoa($newSock->peeraddr);
            
            # enqueue client connection to be processed
            my @values = ($fd, $address);
            if ($clientPool->queueJob(queueValues => \@values)) {
            $logger->error("Failed to queue client connection");
                
                # trash packets received from client
                my $trash = <$newSock>;
                
                send($newSock, "*** ENQUEUE_FAILED ***", 0);
                send($newSock, "*** ZERO_BYTES_LEFT ***", 0);

                $selSock->remove($newSock);
                    
                shutdown($newSock, 2);
                close $newSock;
            }
        }
    }
}
[download]

Each thread in our clientPool then is then dequeuing connections that are enqueued by the server, containing the socket file descriptor, and the ip address of the client for communication and reading input from the client. We then read from this file descriptor like this which works well:

        my $fd = @$work[0];
        our $clientaddr = @$work[1];
        my $selSock = $main::selSock;
        
        # open file handle to read from... if fail to read error
        # error can not be reported back to end user
        # check has been implemented so dev can identify problem by er
+ror msg in server log
        unless (open $fh, '+<&=' . $fd) {
            $logger->error("Unable to open file handle for socket: $!"
+);
            $vars{poolObj}->setDone;
            next;
        }

        # Get the password sent by client
        my $clientpass = <$fh>;
[download]

and check a few more values the clients sends the server. After the checks are done, we are ready to actually process commands.

        # create and add socket file handle so it can be checked withi
+n the while loop
        my $select = IO::Select->new();
        $select->add($fh);
        
        my $last = 1;
        while ($last) {
            my $subroutineCall;
            my $command;
            my $completeLine = "";
            
            # check if anything is ready to be received from client
            my @ready = $select->can_read(.1);
            next unless (@ready);
            
            # loop while receiving from the client untill a full msg h
+as been received as identified by our flags
            while (defined (my $line = <$fh>)) {
                if ($line =~ /\*\*\* CLOSE_CONNECTION_CLIENT \*\*\*\n$
+/) {
                    $logger->info("Client closed connection to the ser
+ver");
                    $last = 0;
                    
                    $selSock->remove($fh);
                    shutdown($fh, 2);
                    close $fh;
                    last;
                }
                        
                $completeLine .= $line;
                if ($completeLine =~ /(.+) \*\*\* COMMAND_INPUT \*\*\*
+ (.+) \*\*\* COMMAND_INPUT_END \*\*\*\n$/is) {
                    $subroutineCall = $1;
                    $command = $2;
                    $logger->info("Processing message \"" . $command .
+ "\"");
                    
                    last;
                }
            }

            if ($last) {
            
                #checking if the subroutine call is defined, if not, i
+t kills that client connection (for cases of closing out of a client 
+window and it sending the empty string)
                if(defined($subroutineCall)){
                $logger->debug("Calling subroutine '${subroutineCall}'
+");
                    
                    my ($string) = executeCommand(subCall => $subrouti
+neCall, command => $command);
                                            
                    send($fh, "$string\n*** ZERO_BYTES_LEFT ***" , 0);
                }else{
                    $logger->info("Undefined subroutine passed, closin
+g client connection.");
                    $last = 0;
                    
                    $selSock->remove($fh);
                    shutdown($fh, 2);
                    close $fh;
                    last;
                }
            }
            
        }
[download]

So this is the main flow of our program. The subroutine executeCommand() uses the modules we have written to interact with our program, and log the output. Each command will be accessing our database getting varrying amounts of information, and updating the table to show jobs in use. For some commands our database must be locked for a good couple minutes or two. We are using the 'begin immediate transaction' to allow other clients to still be able to read from the database. This could be a source of problem, but right now we are thinking it lies elswhere. There two thing we have thought of that COULD be the problem with our code, but all attempts on our own to fix have turned up with very little.

1. log4perl. We write our info messages to the screen, and debug messages to a file. We have LOTS of debug messages. The output of some commands that we run can be well over 50,000 lines of text. We think this could be an issue of multiple threads are attempting to print to the same place at the same time because if we print these messages to the screen, our program hangs extremely quickly, whereas if we keep it to the info messages, things run mostly smoothly. Another sign that this could be a problem, heres a scenario:

we kick off a script to open 80 connections at once, with a client already connected. opening the 80 connections will hang the program if debug messages print to the screen. No new clients can connect to the server. The client already connected however, can issue commands, and have debug messages printed to our log file up to the point of actually receiving the command, but then nothing. nothing is printed to the screen here, only the file. which just seems weird!

We do seem to be running into a memory problem, but we should be able to track that one down, and would difficult for anyone here to do so without complete access to our code. but if you have any ideas on that matter, we'd be happy to hear them.

2. When we execute out commands, we are just doing system calls, but need the output. We originally used backticks to execute our command, but we have now switched to using open() and piping the output to a filehandle. The subroutine also looks for expected output to return to the user and if not found returns the last N lines of the command. that code is seen here, and nearly all of the time when our program hangs, it is around this point.

    # execute the command and save results in a file handle to reduce 
+memory usage
    if (($head =~ /.+/) && ($tail =~ /.+/)) {
        my @lastLines = ();
        my $length;
        open (my $fh, $vars{cmd} . " 2>&1|") or die;
        while (<$fh>) {
            #$logger->info($fh . " " . $_);
            if ($_ =~ /$tail/ims) { #found the last line, terminate lo
+op, do not append the output
                $tailFound = 1;
                last;
            }
            
            if($headFound){ #the header line is already found and igno
+red, so safe to apprend output
                $output .= $_;
            }
            
            if (!$headFound and $_ =~ /$head/ims) { #next lines will b
+e the matched output
                $headFound = 1;
            }
            
            #keeps track of the most recent N lines
            push(@lastLines, $_);
            my $length = @lastLines;
            if($length >= $vars{lastNLines}){
                shift(@lastLines);
            }
            
            #$output .= $_;
        }
        
        unless ($headFound && $tailFound) {
            $output = "\n\nCould not match command output. Printing th
+e last $vars{lastNLines} lines:\n";
            $output .= join("", @lastLines);
        }
        close ($fh);
    } else {
        open (my $fh, $vars{cmd} . " 2>&1|") or die;
        while (<$fh>) {
            #$logger->info($fh . " " . $_);
            $output .= $_;
        }
        close ($fh);
    }
[download]

I really appreciate any help/advice anyone is able to give, so thank you so much in advance for taking the time to look at it!

UPDATE: Thanks everyone for your ideas, in the process of trying them out now. Also forgot to mention that this server is running on windows, using activestate perl. In case that affects any of your suggestions.

UPDATE 8/29/2012: Sorry for the late response everyone. The program took some time to debug and to try various suggestions. We eventually found the problem was log4perl, and for some reason when printing to STDOUT, the program hung. The problem was fixed by printing to STDERR instead. Not sure exactly why this is the case, but it seems to be working now...Thanks again for all your posts!

Back to Seekers of Perl Wisdom