Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine

Apache Processes Hung on Socket Issue

by eallenvii (Novice)
on Jul 03, 2013 at 19:04 UTC ( #1042256=perlquestion: print w/replies, xml ) Need Help??
eallenvii has asked for the wisdom of the Perl Monks concerning the following question:


Hello all. Im having a rather peculiar problem and while i know how to fix it (ie, i changed the code and it fixed the problem) i dont understand the WHY it was behaving poorly in the first place. The facts:
  1. Apache 1.3.41
  2. Mod Perl 1.31
  3. Perl 5.10.1
  4. OS Cent OS 5.2 AND RHEL 5.8 (same result on both)
Using mod perl i have some code that makes a TCP connection to a server:
eval { local $SIG{ALRM} = sub { die 'timed out' }; alarm($connect_timeout); $socket = IO::Socket::INET->new( PeerAddr => $hostname, PeerPort => $port, Proto => 'tcp'); alarm(0); };
I understand that the usage of alarms like this is normal AND i believe at the time of this writing that IO::Socket::INET did not properly institute their Timeout argument, thus the original author used alarm.

The Problem

Basically i have a apache server running with 120 processes (max client directive). We have a healthcheck script on the server that tests for several things that are application dependant to verify that the server is in generally good state. Two of those checks are over the network. Over time looking at server-status and ganglia the server would continually see a rise in the number of processes that would be stuck in "W" or "sending reply" state. In doing a strace on the process it would reveal a forever waiting entry: futex(0x46fa94, FUTEX_WAIT, 2, NULL. The server would eventually, obviously, just stop taking all requests requiring a restart. What we determined was that in one request there was a timeout to the wanted host and on the subsequent reuse of the process the process would hang forever in the wait state to get to the point to create another socket out.

The Fix

Relatively that IO::Socket::INET uses the Timeout properly i switched the code from using an external alarm to using the argument:
eval { local $SIG{ALRM} = sub { die 'timed out' }; # on any subsequent network/socket usage (if thread pr +eviously timeout) $socket = IO::Socket::INET->new( PeerAddr => $hostname, PeerPort => $port, Timeout => $connect_timeout, Proto => 'tcp'); };
Note that the text 'timed out' is used later in the script to do some logic. This not only fixed the issue of the processes being fubar'ed from creating a future socket after a timeout, the timeouts just disappeared. In fact, i had to change what host it wanted to use to something that would drop packets and force a connection timed out.

The Actual Question

What in the world is happening in regards to the old way that would cause this to happen to the processes? Google searches did me no good and while i have a fix in place id love to understand the why. Thanks, let me know if i missed something that may be useful

Replies are listed 'Best First'.
Re: Apache Processes Hung on Socket Issue
by vsespb (Chaplain) on Jul 03, 2013 at 20:33 UTC
    local $SIG{ALRM} = sub { die 'timed out' };
    Above is connection timeout.
    Timeout => $connect_timeout,
    Above is socket timeout.
    You need both.
      I'm not sure what the difference is...but i could see a leap in logic to state that by cutting off the socket creation with a connection timeout would support my theory of the thread being left in a state unable to create further sockets?

        Internet is full of recommendations to use SIG ALARM while socket connect, and as far as I remember, socket "Timeout" option was not working right for connection stage (and/or for DNS timeouts). (not sure about current IO::Socket versions, through)

        Also SIG ALARM is not affecting socket read/write timeouts happened after alarm(0)

        thread being left in a state unable to create further sockets?
        Need probably to see more code, what happening after/before eval?
Re: Apache Processes Hung on Socket Issue
by andal (Hermit) on Jul 04, 2013 at 07:36 UTC

    Just guessing. As far as I know, Apache uses multi-threading. With multi-threading signals are not reliable. The signal is delivered to some thread, not to the specific one that waits for it. As result, the alarm won't work because the signal that it has requested is delivered to wrong thread.

    Maybe Timeout parameter to IO::Socket::INET uses some other way (for example call to select).

      Actually, Apache 1.3.x which eallenvii is using doesn't use threads. It is very old compared to modern versions of Apache (current version is 2.4.4) which do allow threaded operations.

      I'm actually rather impressed that eallenvii is managing to run RHEL 5 with a newer version of perl and much older versions of Apache and mod_perl. That's quite a combination.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1042256]
Front-paged by Corion
[james28909]: is this acceptable? Re: Iterate multiple arrays with added text
[james28909]: wooo im a 1337 hax0r!!!1!1

How do I use this? | Other CB clients
Other Users?
Others musing on the Monastery: (8)
As of 2017-05-23 22:12 GMT
Find Nodes?
    Voting Booth?