Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery
 
PerlMonks  

Re^4: XS module in ithreads Perl much slower in threads::join after adding SvOBJECT_off

by marioroy (Prior)
on Feb 28, 2022 at 23:18 UTC ( [id://11141717]=note: print w/replies, xml ) Need Help??


in reply to Re^3: XS module in ithreads Perl much slower in threads::join after adding SvOBJECT_off
in thread XS module in ithreads Perl much slower in threads::join after adding SvOBJECT_off

Hi, etj

There is awesomeness with PDL 2.076 in better utilization of the L1/L2/L3 cache. That is something to be proud of and a moment of celebration. Unfortunately, 2.076 introduces an anomaly. IMHO try storing the $$.$tid somewhere. For PDL to be thread-safety friendly, it is important for cleanup to occur by the originating thread which instantiated the object.

I will update the MCE demonstrations with a user_begin block to work around the issue. This I can do safely, because the piddles are instantiated by the main thread; i.e. $$.tid == $$.0

Update: Looking at the code, MCE workers also form piddles for the Strassen demonstrations. So I cannot do this unless I save/check the thread ID.

user_begin => sub { # PDL 2.076 introduced a regression causing all threads to perform # piddle destruction. PDL cleanup should occur once, preferably by # the thread which instantiated the object. no warnings; sub PDL::DESTROY { 0; } },

Not able to do the above, I want to mention that the strassen_07_t.pl example works fine as is in the repo by removing the line introduced in PDL 2.076.

SvOBJECT_off((SV *)it->sv);
  • Comment on Re^4: XS module in ithreads Perl much slower in threads::join after adding SvOBJECT_off
  • Select or Download Code

Replies are listed 'Best First'.
Re^5: XS module in ithreads Perl much slower in threads::join after adding SvOBJECT_off
by marioroy (Prior) on Mar 01, 2022 at 08:39 UTC

    I searched the web and came across this page where the developer tries SvOBJECT_off. Stas states, "So we get all kind of problems when automatically dereferencing it."

    SvOK_off(sv); SvIVX(sv) = 0; SvOBJECT_off(sv);

    It is warming to the heart (because of the frustrations at times folks hoping and/or making things threads-safe) to read, "A working solution is needed to make mp2 API perl-ithreads-safe as it's not at the moment, ...".

    Stas settled with the following instead.

    SV *sv = SvRV(obj); if (sv) { /* detach from the C struct and invalidate */ mg_free(sv); /* remove any magic */ SvFLAGS(sv) = 0; /* invalidate the sv */ }

    I tried replacing SvOBJECT_off with mg_free in PDL 2.076.

    /* Clear the sv field so that there will be no dangling ptrs */ if (it->sv) { // SvOBJECT_off((SV *)it->sv); /* problematic, issue #385 */ mg_free(it->sv); /* remove any magic instead */ sv_setiv(it->sv,0x4242); it->sv = NULL; }

    etj, will that work? The Strassen demonstrations work fine and see no adverse effects during global destruction.

      I have implemented this on the current git master branch, and intend to release it very soon, after I have made a couple more tweaks to the demos system which has finally got overhauled. Thanks for the amazing research!

        Thank you, for the enlightenment on using PDL::LinearAlgebra::Real. I updated the examples.

        Passing a flag to the script will attempt to load PDL::LinearAlgebra::Real.
        If available, PDL::LinearAlgebra::Real computes faster via LAPACK/OpenBLAS.
        Use PDL 2.077 or later for best results. Check also, OpenMP-enabled i.e.
        pkg-config --variable=openblas_config openblas | grep -c USE_OPENMP
        
        perl matmult_base.pl  4096        # 54.685s built-in matrix multiply
        perl matmult_base.pl  4096 1      #  6.706s LAPACK/OpenBLAS 1 thread
        perl matmult_base.pl  4096 4      #  1.727s LAPACK/OpenBLAS 4 threads
        
        perl matmult_mce_d.pl 4096 4      # 12.468s built-in matrix multiply
        perl matmult_mce_d.pl 4096 4 1    #  1.915s LAPACK/OpenBLAS 4 threads
        
        perl matmult_mce_f.pl 4096 4      # 11.950s built-in matrix multiply
        perl matmult_mce_f.pl 4096 4 1    #  1.836s LAPACK/OpenBLAS 4 threads
        
        perl matmult_mce_t.pl 4096 4      # 12.245s built-in matrix multiply
        perl matmult_mce_t.pl 4096 4 1    #  1.856s LAPACK/OpenBLAS 4 threads
        
        perl matmult_simd.pl  4096 4      # 16.136s built-in matrix multiply
        perl matmult_simd.pl  4096 4 1    #  1.763s LAPACK/OpenBLAS 4 threads
        
        perl strassen_07_f.pl 4096        #  3.516s built-in matrix multiply
        perl strassen_07_f.pl 4096 1      #  1.915s LAPACK/OpenBLAS 7 threads
        
        perl strassen_07_t.pl 4096        #  3.658s built-in matrix multiply
        perl strassen_07_t.pl 4096 1      #  2.072s LAPACK/OpenBLAS 7 threads
        

        Look at matmult_base.pl go :) This is possible with OpenMP-enabled LAPACK/OpenBLAS libs.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11141717]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others imbibing at the Monastery: (6)
As of 2024-04-23 15:46 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found