Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?

SAX filter in mod_perl

by ironchicken (Novice)
on May 04, 2012 at 18:49 UTC ( #968969=perlquestion: print w/replies, xml ) Need Help??
ironchicken has asked for the wisdom of the Perl Monks concerning the following question:

My application includes a SAX filter which parses a simple markup language into XML elements and which is being executed as part of a XML::SAX::Machines Pipeline within a mod_perl2 handler.

I'm finding that the characters method of my filter emits characters for only two HTTP requests after Apache is started and, for any subsequent HTTP requests, it does not emit characters, although it does continue to emit elements correctly.

I checked the logic of my filter quite carefully and when everything seemed fine, I tried checking to make sure that XML::SAX::Base was receiving the character data my filter was emitting. I did this by altering XML::SAX::Base's characters method thus:

sub characters { my $self = shift; print "\nXML::SAX::Base::characters Received DATA: |" . $_[0]->{Da +ta} . "|\n"; if (defined $self->{Methods}->{'characters'}) { $self->{Methods}->{'characters'}->(@_); } else { my $method; my $callbacks; ...

i.e., I inserted that print statement.

I've found that, when characters are successfully emitted, the print statement gets executed twice. But when the filter stops emitting characters, that statement gets executed only once.

The filter's characters implementation parses the supplied character data, looks for instances of the simple markup language elements and emits chunks of the original character data along with newly generated XML elements.

The filter overrides XML::SAX::Base's start_element method like this:

sub start_element { my ($self, $element) = @_; $self->{parsing_markup} = allow_markup($element->{Name}); $self->SUPER::start_element($element); }

In which allow_markup is a function which determines whether a particular element in the source XML is one for whose content this simple markup language should be applied.

There is an implementation of characters like this:

sub characters { my ($self, $chars) = @_; if ($self->{parsing_markup}) { $self->parse_markup($chars->{Data}); } else { $self->SUPER::characters({Data => $chars->{Data}}); } }

Which sends the character data to parse_markup or just hands it on to XML::SAX::Base's characters method.

The parse_markup method is quite complicated, but its functioning boils down to a mixture of $self->SUPER::start_element, $self->SUPER::end_element, and $self->SUPER::characters calls. The start_element and end_element calls are very likely to be correct as I always get the appropriate tags in the output. But there could be something going awry with the characters calls as this is where the data is going missing.

The call to $self->SUPER::characters looks like this:

my $c = {Data => substr $chars, $from, $upto - $from}; unless ($upto - $from <= 0) { print "\n=> calling SUPER::characters +with " . Dumper($c) . "\n"; } $self->SUPER::characters($c) unless ($upto - $from <= 0);

Which includes some more debugging output, that conditional print call. This output is always as I would expect.

I'm fairly sure that this must have something to do with Apache or mod_perl. But I'm now at a loss as to how to debug further. Any suggestions?

Perl: v5.14.2; mod_perl: 2.0.5; Apache: 2.2.22; XML::SAX::Base: 1.07; all installed from Debian pacakges from the unstable archive.

Replies are listed 'Best First'.
Re: SAX filter in mod_perl
by Anonymous Monk on May 04, 2012 at 19:45 UTC

      Of course, this would make answering the question a lot easier and would also help me to identify further problems. However, having spent the last hour trying, I can't reproduce the problem with a simplified version.

      Futhermore, I've actually now fixed the problem in the real application. Unfortunately, I'm not sure exactly what it was. I had a nested subroutine which was calling substr on a string variable from its outer scope; this variable contained the character data being chunked. I tried moving the procedure encoded in that subroutine into inline statements of the enclosing subroutine, thus dispensing with the nested subroutine, and now the problem does not occur. Possibly it was some sort of scoping problem, such as an unintended closing over that string variable?


        If you don't know what caused the problem, I wouldn't claim that you've fixed it. It's certainly possible that the bug is gone, but it's also possible that the bug still exists, and simply isn't manifesting itself right now. It may just need a different pattern of data to cause some other odd symptom. If you have a suspicion about what the problem *might* have been, I'd encourage you to experiment a little bit more and try to reproduce the bug.

        If you can identify the problem then you can remove the bug with confidence. Additionally, you may learn something new about perl. I frequently find that tracking down bugs helps me either (a) learn more about the workings of perl, (b) figure out coding methods that are less error-prone, and/or (c) improve my ability to question my assumptions and locate bugs more easily.

        I'm sure you've got tasks to complete, and since your immediate problem gone, there's a strong temptation to proceed with the next item in your list. I'm not going to tell you to pull a Captain Ahab and treat this bug as your Moby Dick, but I'd definitely put a little more time into it trying to find the definite cause.



        When your only tool is a hammer, all problems look like your thumb.

Re: SAX filter in mod_perl
by Anonymous Monk on May 06, 2012 at 18:41 UTC
    For what it's worth this is also why I don't like to use "mod_perl" for any heavy-lifting, or for anything (like XML processing) that might eat up a lot of memory and/or do so unpredictably. I like to use FastCGI ... or even a request that is offloaded using the FastCGI protocol ... or POE or what-have-you ifrom/ a mod_perl driver ... to do things like that. I don't like having a single Apache server-process, having processed tens of thousands of "routine, small" Perl-based requests, to suddenly get transformed into a million-pound gorilla by virtue of having directly handled such an out-of-the-ordinary request. I say, let the Apache processes be a user-interface and nothing more.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://968969]
Front-paged by Corion
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others musing on the Monastery: (5)
As of 2017-08-19 23:35 GMT
Find Nodes?
    Voting Booth?
    Who is your favorite scientist and why?

    Results (312 votes). Check out past polls.