Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

MCE Sandbox 2023-08

by marioroy (Prior)
on Aug 28, 2023 at 06:03 UTC ( [id://11154085]=CUFP: print w/replies, xml ) Need Help??

The MCE Sandbox repository is where I try writing fast code using Perl MCE + Inline::C, Math::Prime::Util, and the C/C++ libprimesieve library. The demos and examples folders are new for the 2023 update. I learned Codon, a Python-like language that compiles to native code.

.Inline/ Where Inline::C is configured to cache C object file +s. bin/ algorithm3.pl Practical sieve based on Algorithm3 from Xuedong Luo + [1]. primesieve.pl Calls the primesieve.org C API for generating primes +. primeutil.pl Utilizes the Math::Prime::Util module for primes. demos/ primes1.c Algorithm3 in C with OpenMP directives. primes2.codon Algorithm3 in Codon, a Python-like language. primes3.c Using libprimesieve C API in C primes4.codon Using libprimesieve C API in Codon examples/ Progressive demonstrations. practicalsieve.c single big loop segmentsieve.c segmented variant, faster rangesieve.c process range; start stop prangesieve.c parallel rangesieve in C cpusieve.codon parallel rangesieve in Codon (CPU) gpusieve.codon parallel rangesieve in Codon (GPU) pgpusieve.codon using Codon @par(gpu=True) syntax cudasieve.cu using NVIDIA CUDA Toolkit lib/ Sandbox.pm Common code for the bin scripts. CpuAffinity.pm CPU Affinity support on Linux. src/ algorithm3.c Inline::C code for algorithm3.pl. bits.h Utility functions for byte array. output.h Fast printing of primes to a file descriptor. primesieve.c Inline::C code for primesieve.pl. sandbox.h Header file, includes bits.h, output.h, sprintull.h. sprintull.h Fast base10 to string conversion. typemap Type-map file for Inline::C.

Replies are listed 'Best First'.
Re: MCE Sandbox 2023-08
by marioroy (Prior) on Jan 23, 2024 at 13:00 UTC

    Counting Primes

    It took me some time (on-and-off) getting the OpenMP demonstrations to perform similar to Perl MCE + Inline::C. Counting prime numbers only, primes1.c now performs like algorithm3.pl. Likewise, primes3.c and primes4.codon perform like the primesieve binary or primesieve.pl.

    Testing was done on a 32-core machine.

    # Algorimth3 $ ./bin/algorithm3.pl 1e12 Primes found: 37607912018 Seconds: 14.711 $ ./demos/primes1.gcc 1e12 Primes found: 37607912018 Seconds: 14.499 $ ./demos/primes1.clang 1e12 Primes found: 37607912018 Seconds: 14.587 $ ./demos/primes1.nvc 1e12 Primes found: 37607912018 Seconds: 14.858 $ ./demos/primes2 1e12 Primes found: 37607912018 Seconds: 20.204 # Primesieve $ /usr/local/bin/primesieve 1e12 Sieve size = 256 KiB Threads = 64 100% Seconds: 5.597 Primes: 37607912018 $ ./bin/primesieve.pl 1e12 Primes found: 37607912018 Seconds: 5.707 $ ./demos/primes3.gcc 1e12 Primes found: 37607912018 Seconds: 5.696 $ ./demos/primes3.clang 1e12 Primes found: 37607912018 Seconds: 5.767 $ ./demos/primes3.nvc 1e12 Primes found: 37607912018 Seconds: 5.841 $ ./demos/primes4 1e12 Primes found: 37607912018 Seconds: 5.719

    Printing Primes

    Outputting prime numbers is another story. Workers using MCE output to /dev/shm location in parallel, passing the chunk_id to the manager process to output orderly. This is very fast. The C and Codon demonstrations write directly to STDOUT, orderly. Here, threads wait their turn.

    The saddest moment was witnessing OpenMP consume unnecessary power consumption for waiting threads. I created an issue ticket for LLVM OpenMP and NVIDIA HPC OpenMP. IMHO, only GCC OpenMP pass in this regard. This is the reason GCC ran faster compared to CLANG and NVIDIA NVC.

    Output size for 1e10 is 4.6 GB. Be sure to direct to a command (i.e. cksum) or /dev/null.

    # Algorithm3 $ ./bin/algorithm3.pl 1e10 -p >/dev/null Seconds: 0.743 $ ./demos/primes1.gcc 1e10 -p >/dev/null Seconds: 10.249 $ ./demos/primes1.clang 1e10 -p >/dev/null Seconds: 12.696 $ ./demos/primes1.nvc 1e10 -p >/dev/null Seconds: 14.326 $ ./demos/primes2 1e10 -p >/dev/null Seconds: 12.369 # Primesieve # the primesieve binary uses one core when -p is given $ time /usr/local/bin/primesieve 1e10 -p >/dev/null Seconds: 14.379 $ ./bin/primesieve.pl 1e10 -p >/dev/null Seconds: 0.680 $ ./demos/primes3.gcc 1e10 -p >/dev/null Seconds: 7.145 $ ./demos/primes3.clang 1e10 -p >/dev/null Seconds: 8.826 $ ./demos/primes3.nvc 1e10 -p >/dev/null Seconds: 11.249 $ ./demos/primes4 1e10 -p >/dev/null Seconds: 8.597
Re: MCE Sandbox 2023-08
by karlgoethebier (Abbot) on Aug 28, 2023 at 12:30 UTC
    «…Codon…»

    Thesis + supervisor

    «The Crux of the Biscuit is the Apostrophe»

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: CUFP [id://11154085]
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others rifling through the Monastery: (3)
As of 2025-02-14 15:01 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found