http://www.perlmonks.org?node_id=436982

hacker has asked for the wisdom of the Perl Monks concerning the following question:

I've written a mod_perl little application that takes an rss/rdf/atom feed or a usenet newsgroup by name, and converts it to HTML output, via some custom XSL stylesheets I've written. It uses XML::LibXML and XML::LibXSLT as well as a smattering of XML::RSS and Net::NNTP.

So far, so good, and it works great. The output looks great too. Clean and effective. The target (as with much of my Perl and screen-scraping code) is to be converted for viewing on a handheld through Plucker.

But there's a problem...

Every once in awhile, at unpredictable intervals, running the script to convert the feed to HTML, will cause apache's child servicing that request to segfault. It happens predictably, but randomly, meaning... I can refresh the same POSTDATA, and it will work some of the time, and segfault apache other times.

[Sat Mar 5 19:06:12 2005] child pid 13307 exit signal Segmentation fa +ult (11) [Sat Mar 5 19:12:23 2005] child pid 8125 exit signal Segmentation fau +lt (11) [Sat Mar 5 19:15:41 2005] child pid 8123 exit signal Segmentation fau +lt (11)

Normally, this wouldn't be a problem, because Apache would just spawn another thread to handle the segfaulted process, but I'm running Apache behind Squid as an accellerator, and when Apache segfaults a child, Squid loses the handle, and drops the socket.

Before you ask, I'm using strict, warnings and diagnostics. There are no closures. Everything that could possibly be causing this has been checked and checked again. The code itself runs very clean, and it is probably one of the best pieces of Perl I've written to date.

The weirder piece of this, is that it only segfaults Apache when its running as a mod_perl application. It does not segfault when running as mod_cgi, in the same directory. Of course, running it as mod_cgi is about 80% slower, so that's not an option if I want to launch this tool publically. I even tried using Memoize to gain a bit more control over the way functions are evaluated, but that didn't seem to improve matters at all.

I've tried Apache 1.3.33 as source-built, Debian packaged, and on FreeBSD. I've tried threaded and non-threaded apache+mod_perl builds (-lpthread and USE_THREADS=1). I've tried DSO and static. I've tried separating the mod_perl and mod_php servers into their own instances on separate ports, with physically-separate config files. Still no success in stopping the segfaults.

I've also tried using Apache 2.0.53 in the same configuration, threaded, non-threaded, DSO, static, shared instances, separate instances, etc. Again, no luck.

I've tried using stock linuxthreads and also NPTL, with the same negative results.

After days of frustration, I tried another wild approach, with Apache 2.1.3 sitting behind Squid, acting as a ProxyPass agent, talking to two physically-separate instances of 1.3.33 running on separate ports. Still no luck.

So I'm at an impasse. I don't quite know how to figure out why this is happening, and how to avoid it. I even went through the Apache debugging steps one-by-one, and still could not find a repducable way to get it to crash in the same place so I could identify which apache or mod_perl interface has the bug. I've run it through gdb and strace hundreds of times (yes tye, I've used 'attach' here <grin>), but it keeps dying in different places, its very inconsistent.

The last step, before I finally give up and write this in another language, is to go through the mod_perl Porting Guide that belg4mit referred me to, and see if there's any last things I might have missed.

Is there a way to get even more granular with this, to really see if there are some hidden globals I can't see, or closures not being reported by strict/warnings/taint/diagnostics? It's just about driving me mad now, 5 days and counting, so I'd like to solve it and move on to the next project.

HELP!

Replies are listed 'Best First'.
Re: mod_perl go boom, mod_cgi works
by Tanktalus (Canon) on Mar 06, 2005 at 04:14 UTC

    You haven't mentioned which version of perl you're using. I've noticed memory problems that went away when upgrading from 5.6.x to 5.8.x. Specifically, I was using XML::Twig, which, of course, uses XML::Parser, which I'm sure at least some of your modules use. I suppose the overhead in all the XML handling that I was doing managed to confuse 5.6, but those problems were fixed in 5.8.

    So, that's my first offering. My experience managed to crash in inconsistant locations, but, abstractly it was consistant: every time I ran my program. My program was taking an existing ~50KB XML file, and then add data it got from another source to make a multi-hundred KB XML file (actually, I have a hard time remembering, but I'm guessing it was multi-MB at the end). This may be your problem in that, over multiple requests, you're chewing up too much memory - and some of it may be memory junked inside the expat XML parsing C library.

    Thus, one solution is to upgrade perl, if you haven't already gone to perl 5.8.3+. Another is to reduce the number of requests each Apache child can handle before exiting, thus exiting before the accumulation of problem memory chunks to the point of a crash. This latter idea is not dissimilar to the idea of rebooting NT4 servers nightly to keep them from crashing after about 40 hours of uptime, which I seem to recall was a popular problem. It's not the right solution (the server should be fixed - in this case, it's probably between Perl and Expat), but it offers a useful workaround until then.

      "Another is to reduce the number of requests each Apache child can handle before exiting, thus exiting before the accumulation of problem memory chunks to the point of a crash."

      This idea strikes me in two ways...

      1. The code or running application is not freeing memory or globals at each iteration...
      2. This is just a workaround for badly-written code that is incorrectly using globals or some other shared structures.

      That being said, I've limited MaxRequestsPerChild in Apache to 2, and so far, I am not able to reproduce the problem. I'm not confident this "solved" the problem however...

      One thing I did notice though, was that when I'm clicking through my sample feeds linked from the front page of this mod_perl application, sometimes my click for 'foo.bar.xml' will report that 'example.com.rdf' does not exist. example.com.rdf is a feed I might have clicked last, or 10 clicks ago. It seems random, but leads me to believe there is some persistence going on here. A clue?

      Is there a more detailed way to find out if I'm using globals somewhere, that my old eyes can't see?

      Thanks for the suggestion.

      Unfortunately, I'm already running 5.8.4, and I've taken the time to rebuild cleanly, every single module used in this chain, from upstream latest available sources. I was specifically pedantic about making sure EVERY test passed with 100%.

Re: mod_perl go boom, mod_cgi works
by perrin (Chancellor) on Mar 06, 2005 at 04:40 UTC
    If you can reliably reproduce a segfault, people on the mod_perl mailing list can help you fix it. See this for instructions on generating a useful backtrace file. If you have trouble reproducing it, try running in single process mode, using the -X flag.
      Part of the problem with running it in single-process mode (ala -X) is that after some random interval, httpd stops responding to requests for the content. I can click on links on the mod_perl page, and the hourglass flickers for a second, then goes away, and the logs on the server-side don't even show a hit at all.

      It just seems to drop the request somewhere between my client and the single-process httpd when I do that. I even tried running it in "non-single-process" mode, with a MaxClients setting of 1, and it does the same thing.

      Its INCREDIBLY hard to track down (hence my reason for posting).

      I'll take a look at the link you've provided to see if maybe there's something I missed, some way to get a proper core file out of here that I can use.

Re: mod_perl go boom, mod_cgi works
by saintmike (Vicar) on Mar 06, 2005 at 03:23 UTC
    You've already done a lot of investigative work, here's two more suggestions: Did you take a look at the core file with gdb? Even if it dies in different places, this might reveal a pattern. Also, is memory usage of the child growing over time?

    Also, you might want to capture all web input of a child until a crash, dump it and replay it. The problem should be reproducable then. Oh, and you were running Apache with just a single child, right?

Re: mod_perl go boom, mod_cgi works
by The Mad Hatter (Priest) on Mar 06, 2005 at 16:01 UTC
    Could you post the code here or put it somewhere we could get it? That might help facilitate review for tripfalls.
      Unfortunately, I can't post the full code, but I can mock up a pseudo-code example of exactly what I'm working with, which should be clear enough to find any tricky spots.
        Why you can't post the full code? Pseudo-code is unlikely to be any good in this case, but still post it just to cover your bases.
Re: mod_perl go boom, mod_cgi works
by jdalbec (Deacon) on Mar 07, 2005 at 01:58 UTC
    It sounds like you might be getting into an infinite loop that gradually exhausts memory. Try running single-threaded again, but this time after it stops responding let it keep running for a while and see if you get a crash.
Re: mod_perl go boom, mod_cgi works
by jk2addict (Chaplain) on Mar 07, 2005 at 04:05 UTC

    Just a thought, but have you tried building apache using the --disable-rule=EXPAT? I know AxKit (using LibXML/LibXSLT) segfaults under mod_perl when EXPAT is compiled into Apache. In fact, it's a known warning/check when compiling/installing AxKit.

    Most Apache ports in the ports on FreeBSD have this Makefile option too: WITHOUT_APACHE_EXPAT.

      Does this remove expat support? Or just rely on the external expat system library/headers instead?

        To be honest, I have no idea. Here's the mention on AxKit:

        There are currently some incompatibilities between the versions of expat loaded by Apache when compiled with RULE_EXPAT=yes (which is a default, unfortunately), and XML::Parser's copy of expat. This can cause sporadic segmentation faults in Apache's httpd processes. The solution is to recompile Apache with RULE_EXPAT=no (later Apache's have implemented this as --disable-rule=expat). If you have a recent mod_perl and use mod_perl's Makefile.PL DO_HTTPD=1 to compile Apache for you, this option will be enabled automatically for you.