Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"

[Solved] Request help for work around for bug in perl 5.10.1

by babel17 (Acolyte)
on Dec 12, 2012 at 19:31 UTC ( #1008575=perlquestion: print w/replies, xml ) Need Help??
babel17 has asked for the wisdom of the Perl Monks concerning the following question:

Let me start by saying that I know the correct thing to do is upgrade to a newer version of perl, but I request that you trust me on this - right now, upgrade is not an option.

The host OS is FreeBSD 8.0 ( and no, upgrading the OS isn't an option either. )

The problem is that perl sometimes panics when I try to open a UNIX domain socket to communicate with a C program.

main::die_catcher('Assertion ((svtype)((sv)->sv_flags & 0xff)) != 0xff + failed: f...') called at /usr/local/lib/perl5/5.10.1/mach/IO/Handle. +pm line 480 IO::Handle::autoflush('IO::Socket::UNIX=GLOB(0x802a5bff0)', 1) called +at /usr/local/lib/perl5/5.10.1/mach/IO/ line 44 IO::Socket::new('IO::Socket::UNIX', 'Peer', '/var/HA/myUNIXDomainSocke +t', 'Type', 1) called at /usr/local/lib/perl5/5.10.1/mach/IO/Socket/U line 23 IO::Socket::UNIX::new('IO::Socket::UNIX', 'Peer', '/var/HA/myUNIXDomai +nSocket', 'Type', 1) called at /usr/local/lib/perl5/site_perl/5.10.1/ line 290
The offending code is -
my $connection = IO::Socket::UNIX->new( Peer => $uds_path, Type => SOCK_STREAM );

So my question is, "Is anyone familiar with this bug?" I could find very little about it on the web ( mainly ), and nothing in the perl bug database. What I'm really looking for is the conditions that trigger it, as it is a fairly low frequency occurrence, but once it gets triggered, it seems to keep happening. Is it possible that it is triggered by the state of the other side of the socket? ( This seems unlikely, as it looks as though autoflush is set before trying to connect )

Failing controlling the conditions that trigger the bug, is there a reasonable workaround? The one that comes to mind is to open the socket inside of an eval. Your thoughts on the matter will be appreciated.

Edit: OK here is full diag
Assertion ((svtype)((sv)->sv_flags & 0xff)) != 0xff failed: file "sv.c +", line 5400 at /usr/local/lib/perl5/5.10.1/mach/IO/ line 48 +0.

Update: Solution

So in the end this turned out to be be programmer error. For complicated reasons, and simple ones (like stupidity), I was calling Sys::Syslog->syslog() after syslog had been closed. Strangely enough this all seemed to work fine until syslogd was SIGHUP'd by log rotation, and then things started to go wrong. Presumably something got terribly screwed up in the relationship between perl filehandles and C file descriptors.

Replies are listed 'Best First'.
Re: Request help for work around for bug in perl 5.10.1 (weak magic)
by tye (Sage) on Dec 12, 2012 at 20:11 UTC
    'Assertion ((svtype)((sv)->sv_flags & 0xff)) != 0xff failed: f...'

    Did you notice that you failed to ever include the actual full error message? It is a nice touch to include the call stack that you included. But I wouldn't be surprised if there is useful information that has been replaced by "f..." in what you've provided.

    I did a very quick look to see if I could find this assertion. I failed to find anything in your report that was much use in pointing to a specific assertion. But I did stumble upon this part of sv.h which seems to match the expression in the part of the error message that you preserved:

    /* Sadly there are some parts of the core that have pointers to alread +y-freed SV heads, and rely on being able to tell that they are now free. So + mark them all by using a consistent macro. */ #define SvIS_FREED(sv) ((sv)->sv_flags == SVTYPEMASK)

    And searching for SvIS_FREED in the 5.10.1 source code only turned up this candidate:

    /* delete a back-reference to ourselves from the backref magic associa +ted * with the SV we point to. */ STATIC void S_sv_del_backref(pTHX_ SV *tsv, SV *sv) { ... assert(!SvIS_FREED(av));

    Which I'm pretty sure is code that gets run when a weak ref is destroyed. But then the evidence that this is the assertion is rather indirect (but is enough to convince me -- after I did another search for "SVTYPEMASK" and found no candidates).

    Running "git grep -i weak perl-5.10.1 ext/IO" finds only the use of the word "tweak" in ext/IO/ChangeLog. So I don't see the weak ref being the fault of IO::Socket::* or IO::Handle (all of which live under ext/IO in the Perl source code).

    Are you using weak references somewhere? Not that I find it easy to imagine how your use of a weak reference could matter to the call to IO::Socket::Unix->new() that you showed.

    A significant number of cases of successfully fixing similarly mysterious bugs leads me to suggest the following technique (which I've seen succeed significantly more often than I've seen it fail, somewhat astonishingly):

    Look through the modules that you are making use of and find the one that makes use of the most "magic". In particular, modules that make use of XS are especially good candidates. Switch to a newer or older version of that module. Even better, stop using that module. You can often replace such modules with similar modules that don't use XS.

    Often this will make the error go away or just happen even less frequently. Other times, I've seen it just change the error to something that is easier to debug (sometimes because it happens much more frequently and so can be reproduced in an environment where more details can be extracted). Getting rid of XS modules usually just makes the error go away (astonishingly often).

    Update: Using threads is also an easy way to trigger mysterious errors, IME. (It isn't that you can't successfully use Perl "threads"; it is that is way too easy to get bizarre error messages when using threads, errors that just go away when you stop using "threads".)

    - tye        

      That assert also appears in sv_clear(). But as you say, truncating the error message is annoying. Normally an assert tells us the C source file and line number of the error.


        That assert also appears in sv_clear().

        Indeed! In fact, that assertion is likely the culprit. I previously missed that line from my "git grep SVTYPEMASK perl-5.10.1" output.

        But now I realize that the assertion that I pointed out can't be the one that is to blame because it uses "!(...==...)" which might well get changed by the optimizer into "...!=..." but such an optimization would not impact the string that the C preprocessor produces and that gets included in the error message.

        So the "weak ref" stuff was a red herring.

        So this bug is a little more mundane. We have a scalar with a ref count of 0 that has been freed that is trying to be cleared again. Eliminating XS modules (or up- or down-grading them) can still be useful, but is not as likely to matter as with the more mysterious scenario of a rogue weak ref being destroyed, IME.

        The smell of this bug is becoming more similar to "subtle bugs in Perl that only seem to appear in long-running Perl daemons". So the bug might stop happening if the daemon can be made to run something like:

        exec( $^X, $0, @ARGV ); die "Couldn't exec self: $!";

        Every so often (like once per day).

        Update: Use of "threads" still seems plausible as a major contributing factor.

        - tye        

      So I apologize for the lack of clear error messages. The code is detached, so I have to trap __DIE__ and write to syslog. This is done as follows:
      $SIG{__DIE__} = \&die_catcher; ... # # die_catcher # # Try to log the call stack if we die # sub die_catcher { for my $caller ( CallStack() ) { alog(LOG_ERR, $caller); } } ... # # Courtesy of Perl Monks # sub CallStack { local $@; eval { confess( '' ) }; my @stack = split m/\n/, $@; shift @stack for 1..3; # Cover our tracks. return wantarray ? @stack : join "\n", @stack; }
      I'm not really that familiar with Carp, I'll see what I can figure out to make it give a more complete backtrace.
        I'm not really that familiar with Carp, I'll see what I can figure out to make it give a more complete backtrace.

        The problem wasn't that the backtrace was unacceptable. The problem was that you simply failed to log @_ (really just $_[0]), which is the most important thing to log when logging inside of a __DIE__ handler.

        Luckily, the backtrace also happened to include a truncated version of that message. But the solution isn't to try to get the backtrace to not truncate. The solution is to log $_[0] directly (along with the backtrace).

        In this particular case, the interesting backtrace is the C backtrace, but that isn't as easy to get (you'd likely need a non-stripped build and gcc and...).

        - tye        

Re: Request help for work around for bug in perl 5.10.1
by bulk88 (Priest) on Dec 13, 2012 at 04:37 UTC

      My code wasn't using weak references, so if that's the problem, it must have been strictly internal to perl libraries.

      At this point, it looks like the problem is instigated by the C program failing to properly close the socket after a previous communication. Unfortunately, Can't seem to reproduce the problem now.

        So I now have the full diagnostic output:

        Assertion ((svtype)((sv)->sv_flags & 0xff)) != 0xff failed: file "sv.c +", line 5400 at /usr/local/lib/perl5/5.10.1/mach/IO/ line 48 +0.
        This is indeed in sv_clear as some mentioned earlier.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1008575]
Approved by GrandFather
[Yary]: Looking at system pm's, looks like it's "package" on the first line, then POD, then "use;use strict; use warnings; use modules..."
[Yary]: ah thanks for those links
[hippo]: Just bear in mind that all such ideas are subjective.
[choroba]: There might be a different standard at the shop you write the code for.
[choroba]: The order of package - (strict+warnings | warnings+strict) - modules makes sense, POD and strict-warnings order are not so important

How do I use this? | Other CB clients
Other Users?
Others romping around the Monastery: (9)
As of 2018-01-23 17:07 GMT
Find Nodes?
    Voting Booth?
    How did you see in the new year?

    Results (250 votes). Check out past polls.