Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer

Re: Request help for work around for bug in perl 5.10.1 (weak magic)

by tye (Sage)
on Dec 12, 2012 at 20:11 UTC ( #1008580=note: print w/replies, xml ) Need Help??

in reply to [Solved] Request help for work around for bug in perl 5.10.1

'Assertion ((svtype)((sv)->sv_flags & 0xff)) != 0xff failed: f...'

Did you notice that you failed to ever include the actual full error message? It is a nice touch to include the call stack that you included. But I wouldn't be surprised if there is useful information that has been replaced by "f..." in what you've provided.

I did a very quick look to see if I could find this assertion. I failed to find anything in your report that was much use in pointing to a specific assertion. But I did stumble upon this part of sv.h which seems to match the expression in the part of the error message that you preserved:

/* Sadly there are some parts of the core that have pointers to alread +y-freed SV heads, and rely on being able to tell that they are now free. So + mark them all by using a consistent macro. */ #define SvIS_FREED(sv) ((sv)->sv_flags == SVTYPEMASK)

And searching for SvIS_FREED in the 5.10.1 source code only turned up this candidate:

/* delete a back-reference to ourselves from the backref magic associa +ted * with the SV we point to. */ STATIC void S_sv_del_backref(pTHX_ SV *tsv, SV *sv) { ... assert(!SvIS_FREED(av));

Which I'm pretty sure is code that gets run when a weak ref is destroyed. But then the evidence that this is the assertion is rather indirect (but is enough to convince me -- after I did another search for "SVTYPEMASK" and found no candidates).

Running "git grep -i weak perl-5.10.1 ext/IO" finds only the use of the word "tweak" in ext/IO/ChangeLog. So I don't see the weak ref being the fault of IO::Socket::* or IO::Handle (all of which live under ext/IO in the Perl source code).

Are you using weak references somewhere? Not that I find it easy to imagine how your use of a weak reference could matter to the call to IO::Socket::Unix->new() that you showed.

A significant number of cases of successfully fixing similarly mysterious bugs leads me to suggest the following technique (which I've seen succeed significantly more often than I've seen it fail, somewhat astonishingly):

Look through the modules that you are making use of and find the one that makes use of the most "magic". In particular, modules that make use of XS are especially good candidates. Switch to a newer or older version of that module. Even better, stop using that module. You can often replace such modules with similar modules that don't use XS.

Often this will make the error go away or just happen even less frequently. Other times, I've seen it just change the error to something that is easier to debug (sometimes because it happens much more frequently and so can be reproduced in an environment where more details can be extracted). Getting rid of XS modules usually just makes the error go away (astonishingly often).

Update: Using threads is also an easy way to trigger mysterious errors, IME. (It isn't that you can't successfully use Perl "threads"; it is that is way too easy to get bizarre error messages when using threads, errors that just go away when you stop using "threads".)

- tye        

Replies are listed 'Best First'.
Re^2: Request help for work around for bug in perl 5.10.1 (weak magic)
by dave_the_m (Prior) on Dec 12, 2012 at 21:21 UTC
    That assert also appears in sv_clear(). But as you say, truncating the error message is annoying. Normally an assert tells us the C source file and line number of the error.


      That assert also appears in sv_clear().

      Indeed! In fact, that assertion is likely the culprit. I previously missed that line from my "git grep SVTYPEMASK perl-5.10.1" output.

      But now I realize that the assertion that I pointed out can't be the one that is to blame because it uses "!(...==...)" which might well get changed by the optimizer into "...!=..." but such an optimization would not impact the string that the C preprocessor produces and that gets included in the error message.

      So the "weak ref" stuff was a red herring.

      So this bug is a little more mundane. We have a scalar with a ref count of 0 that has been freed that is trying to be cleared again. Eliminating XS modules (or up- or down-grading them) can still be useful, but is not as likely to matter as with the more mysterious scenario of a rogue weak ref being destroyed, IME.

      The smell of this bug is becoming more similar to "subtle bugs in Perl that only seem to appear in long-running Perl daemons". So the bug might stop happening if the daemon can be made to run something like:

      exec( $^X, $0, @ARGV ); die "Couldn't exec self: $!";

      Every so often (like once per day).

      Update: Use of "threads" still seems plausible as a major contributing factor.

      - tye        

Re^2: Request help for work around for bug in perl 5.10.1 (weak magic)
by babel17 (Acolyte) on Dec 12, 2012 at 21:43 UTC
    So I apologize for the lack of clear error messages. The code is detached, so I have to trap __DIE__ and write to syslog. This is done as follows:
    $SIG{__DIE__} = \&die_catcher; ... # # die_catcher # # Try to log the call stack if we die # sub die_catcher { for my $caller ( CallStack() ) { alog(LOG_ERR, $caller); } } ... # # Courtesy of Perl Monks # sub CallStack { local $@; eval { confess( '' ) }; my @stack = split m/\n/, $@; shift @stack for 1..3; # Cover our tracks. return wantarray ? @stack : join "\n", @stack; }
    I'm not really that familiar with Carp, I'll see what I can figure out to make it give a more complete backtrace.
      I'm not really that familiar with Carp, I'll see what I can figure out to make it give a more complete backtrace.

      The problem wasn't that the backtrace was unacceptable. The problem was that you simply failed to log @_ (really just $_[0]), which is the most important thing to log when logging inside of a __DIE__ handler.

      Luckily, the backtrace also happened to include a truncated version of that message. But the solution isn't to try to get the backtrace to not truncate. The solution is to log $_[0] directly (along with the backtrace).

      In this particular case, the interesting backtrace is the C backtrace, but that isn't as easy to get (you'd likely need a non-stripped build and gcc and...).

      - tye        

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1008580]
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others contemplating the Monastery: (6)
As of 2018-06-25 01:10 GMT
Find Nodes?
    Voting Booth?
    Should cpanminus be part of the standard Perl release?

    Results (126 votes). Check out past polls.