Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

shared scalar freed early

by chris212 (Scribe)
on Feb 22, 2017 at 16:11 UTC ( #1182539=perlquestion: print w/replies, xml ) Need Help??

chris212 has asked for the wisdom of the Perl Monks concerning the following question:

I have a script that uses threading. It has a shared variable ($ret) that is used to indicate if an error has occurred. All the threads check that variable so that work will stop in the event of a fatal error.

Unfortunately I cannot post the script. I'm not able to create a test script to replicate the issue since it is very intermittent and only seems to happen with long runs (over 4 hours). It doesn't use much memory, so it doesn't seem to be a memory leak.

Basically the script will start a new thread to handle writing output, and the main thread will start possibly millions of threads to process 500 records each as they are read using a semaphore to limit the number of concurrent threads. All threads check the value of the shared scalar. They can all modify the value, but didn't when these crashes occur.

>> /polaris_stg_root/dev/app/smartload/components/correctaddress_debug +/correctaddress_debug.pl:11: $Devel::Trace::TRACE = 0; SV = PVMG(0x10a6350) at 0xf68f68 REFCNT = 5 FLAGS = (PADMY,GMG,SMG,IOK,pIOK) IV = 0 NV = 0 PV = 0 MAGIC = 0xd5fb30 MG_VIRTUAL = 0x7f361a0c9320 MG_TYPE = PERL_MAGIC_shared_scalar(n) MG_FLAGS = 0x30 DUP LOCAL MG_PTR = 0xee8f08 "" SV = PVMG(0x114ec30) at 0x114d558 REFCNT = 5 FLAGS = (PADMY,GMG,SMG,IOK,pIOK) IV = 0 NV = 0 PV = 0 MAGIC = 0x1150a70 MG_VIRTUAL = 0x7f361a0c9320 MG_TYPE = PERL_MAGIC_shared_scalar(n) MG_FLAGS = 0x30 DUP LOCAL MG_PTR = 0xee8f08 "" Attempt to free unreferenced scalar: SV 0xee8f08, Perl interpreter: 0x +ee6410. >> /polaris_stg_root/dev/app/smartload/components/correctaddress_debug +/correctaddress_debug.pl:884: exit($ret) if($ret == -1); # already fa +iled, don't compare counts or print stats panic: attempt to copy freed scalar ee8f08 to f68f68 at /polaris_stg_r +oot/dev/app/smartload/components/correctaddress_debug/correctaddress_ +debug.pl line 884. Attempt to free unreferenced scalar: SV 0xee8f68, Perl interpreter: 0x +ee6410. Attempt to free unreferenced scalar: SV 0xee8f08, Perl interpreter: 0x +ee6410.

One dump is the $ret variable from before the output thread returns. The other is from the main thread after all the input is read. I should have dumped it after the output thread is joined and will if I can make it crash again. The main thread does not have any references to $ret between dumping it and line 884.

UPDATE

I got a dump after the output thread is joined, and the refcount is still 5. There are no references to $ret until it crashes, so it seems the memory is freed even though the refcount is 5?

>> /polaris_stg_root/dev/data/QAS_TEST/correctaddress_debug.pl:11: $De +vel::Trace::TRACE = 0; Main thread before output thread finishes: SV = PVMG(0x19555c0) at 0x1822318 REFCNT = 5 FLAGS = (PADMY,GMG,SMG,IOK,pIOK) IV = 0 NV = 0 PV = 0 MAGIC = 0x190b330 MG_VIRTUAL = 0x7fe80f9de320 MG_TYPE = PERL_MAGIC_shared_scalar(n) MG_FLAGS = 0x30 DUP LOCAL MG_PTR = 0x17a7b68 "" Output thread before returning: SV = PVMG(0x19f2730) at 0x1a3d280 REFCNT = 5 FLAGS = (PADMY,GMG,SMG,IOK,pIOK) IV = 0 NV = 0 PV = 0 MAGIC = 0x1a3e2e0 MG_VIRTUAL = 0x7fe80f9de320 MG_TYPE = PERL_MAGIC_shared_scalar(n) MG_FLAGS = 0x30 DUP LOCAL MG_PTR = 0x17a7b68 "" Main thread after output thread finishes: SV = PVMG(0x19555c0) at 0x1822318 REFCNT = 5 FLAGS = (PADMY,GMG,SMG,IOK,pIOK) IV = 0 NV = 0 PV = 0 MAGIC = 0x190b330 MG_VIRTUAL = 0x7fe80f9de320 MG_TYPE = PERL_MAGIC_shared_scalar(n) MG_FLAGS = 0x30 DUP LOCAL MG_PTR = 0x17a7b68 "" >> /polaris_stg_root/dev/data/QAS_TEST/correctaddress_debug.pl:884: ex +it($ret) if($ret == -1); # already failed, don't compare counts or pr +int stats panic: attempt to copy freed scalar 17a7b68 to 1822318 at /polaris_stg +_root/dev/data/QAS_TEST/correctaddress_debug.pl line 884. Attempt to free unreferenced scalar: SV 0x17a7b68, Perl interpreter: 0 +x1775870.

UPDATE 2

I changed the script to not use the global $ret variable (shared or not) from the input, output, or worker threads. This seems to keep my script from crashing. I may use marioroy's MCE approach in a future version.

Replies are listed 'Best First'.
Re: shared scalar freed early
by Corion (Patriarch) on Feb 22, 2017 at 17:29 UTC

    In addition to BrowserUks diagnosis, my random guess would be that you have a not-so-threadsafe XS component involved, which thrashes some part of memory. Perl recognizes when memory is allocated in one thread and then freed in another thread and does not like that, hence your error message. Maybe, this is caused by some XS component used from more than one thread.

    One approach to test this hypothesis would be to rewrite your script so that no modules are used which have an XS component, and to replace all essential parts with pure Perl components. Ideally, this makes the problem go away.

      Actually, I forgot about the 3rd party SWIG module that is require'ed in the main thread (conditionally, Win32::API for Windows). The worker threads inherit and use it. That could very well be the culprit, although I'm not sure why it would change the refcount on my shared scalar. I could try taking out the module and see if I can replicate the problem. It should run much faster without it.
        Nope. I got it to crash after removing the module, so I don't think that module is the problem.
      There are no XS components being utilized when it crashed. The only non-core component that can be used is Text::CSV_XS, but it will crash even if the "require Text::CSV_XS" line is never executed.
        There are no XS components being utilized when it crashed. The only non-core component that can be used is ...

        There are many core modules that have an XS component.

        Cheers,
        Rob
Re: shared scalar freed early
by ikegami (Patriarch) on Feb 22, 2017 at 20:18 UTC

    What version of Perl are you using? No, that's not the right question. The real question is: Do you still have the problem with the latest version of Perl?

      5.18.2

      That is currently the only version I have available to test, but I could request a newer version. I'm not sure what the latest version approved by our security is.

        perl 5.24.1 is the current stable release, for your information, and is available for Unix/Linux, and for Windows ActiveState and Strawberry.

        Bureaucracy is completely understandable by most here (I'm sure), so I might advise to test things on more recent versions (after taking all the advice provided into consideration) on the side, and if results are improved, you have justification to request a change. That, combined with the fine documentation on each release perldoc perldelta, may help show that allowing newer releases is pretty reasonable.

        I just went through this with a $work client, and my job is designing solutions for a significant C++ suite with Python hooks. I wanted Perl because I can debug and write quick tests for a Python script that accesses the underlying C++ dll quicker than I could by using Python directly. Client uses Perl for a few things, but they mandated v5.8.8. That changed pretty quickly, in this 290k+ user environment after I demo'd a couple of quick things that used Perl to aggrivate Python to access what I needed done.

Re: shared scalar freed early
by BrowserUk (Patriarch) on Feb 22, 2017 at 17:26 UTC
    Unfortunately I cannot post the script.

    There is simply not enough information here to even begin to diagnose the problem. It's equivalent to going to the garage because your car is overheating and presenting them with two instantaneous temp. gauge readings, and no car.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority". The enemy of (IT) success is complexity.
    In the absence of evidence, opinion is indistinguishable from prejudice.
      I guess I was hoping someone might be familiar with a known defect causing the "overheating" or might point me in the direction for troubleshooting it (like check oil and coolant). I've been trying to find out what I can about it. A variable will not be freed unless the refcount reaches zero? So why would Perl decrement the refcount for a shared scalar in a threaded script too many times? Perl is doing something it shouldn't, right?

        My best guess based on the scant information is that your main thread is ending without properly waiting for your other threads to be cleaned up. But it is nothing but a guess.

Re: shared scalar freed early
by Laurent_R (Canon) on Feb 22, 2017 at 18:22 UTC
    the main thread will start possibly millions of threads to process 500 records each
    Are you seriously considering millions of threads?

      Yeah, is that a problem? Did you miss the "using a semaphore to limit the number of concurrent threads" part of that sentence? Millions of threads will will be started throughout the execution of the script, but not all at the same time! I can't pass too much data to the thread's sub, right?

      UPDATE:

      I just did the math, and the test data I am using to replicate the crash would create 766,747 worker threads, but only 32 at a time. I tried making the threads persistent and read data from a queue, but that was MUCH slower.

        Starting threads in Perl is rather expensive. It's far faster to reuse threads (e.g. using a worker model). You've indicated otherwise, but that simply points to a problem with the implementation you used.

        use strict; use warnings; use feature qw( say ); use threads; use Thread::Queue qw( ); use Thread::Semaphore qw( ); use Time::HiRes qw( time ); use constant MAX_WORKERS => 32; use constant NUM_JOBS => 1_000; { my $sem = Thread::Semaphore->new(MAX_WORKERS); my $s = time; for my $job (1..NUM_JOBS) { $sem->down(); $_->join() for threads->list(threads::joinable); async { # ... $sem->up(); }; } $_->join() for threads->list(); my $e = time; say($e-$s); # 5.88567113876343 } { my $q = Thread::Queue->new(); for (1..MAX_WORKERS) { async { while (defined( my $job = $q->dequeue() )) { # ... } }; } my $s = time; $q->enqueue($_) for 1..NUM_JOBS; $q->end(); $_->join() for threads->list(); my $e = time; say($e-$s); # 0.248196125030518 }

        Things couldn't be any more ideal for creating threads (minimal amount of variables to clone), yet creating all those threads was 25x slower than reusing a few threads. (The factor will grow as the number of jobs increases.)

        I tried making the threads persistent and read data from a queue, but that was MUCH slower.

        Sounds to me like you are doing something very strange. Creating a Perl "thread" is rather expensive. Much more expensive than pulling an item from a queue. Usually much, much more expensive.

        - tye        

        Yeah, is that a problem?
        Oh, yes it is. Spawning a very large number of threads will most certainly take an heavy toll on the machine's resources and at best slowdown everything (or more probably bring your machine down), even if many of them are idle at any point of time.

        Other monks have already explained while I was off-line that it is much better to have a relatively limited number of threads picking work from a job queue (or something similar), and I absolutely agree with ikegami and tye on that.

        (And this is why I candidly asked the question in the first place, as well as to make sure I understood you correctly, because the idea seemed so extravagant to me.)

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1182539]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others wandering the Monastery: (3)
As of 2022-06-30 07:38 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    My most frequent journeys are powered by:









    Results (97 votes). Check out past polls.

    Notices?