Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

Re: MCE Sandbox 2023-08

by marioroy (Prior)
on Jan 23, 2024 at 13:00 UTC ( [id://11157187]=note: print w/replies, xml ) Need Help??


in reply to MCE Sandbox 2023-08

Counting Primes

It took me some time (on-and-off) getting the OpenMP demonstrations to perform similar to Perl MCE + Inline::C. Counting prime numbers only, primes1.c now performs like algorithm3.pl. Likewise, primes3.c and primes4.codon perform like the primesieve binary or primesieve.pl.

Testing was done on a 32-core machine.

# Algorimth3 $ ./bin/algorithm3.pl 1e12 Primes found: 37607912018 Seconds: 14.711 $ ./demos/primes1.gcc 1e12 Primes found: 37607912018 Seconds: 14.499 $ ./demos/primes1.clang 1e12 Primes found: 37607912018 Seconds: 14.587 $ ./demos/primes1.nvc 1e12 Primes found: 37607912018 Seconds: 14.858 $ ./demos/primes2 1e12 Primes found: 37607912018 Seconds: 20.204 # Primesieve $ /usr/local/bin/primesieve 1e12 Sieve size = 256 KiB Threads = 64 100% Seconds: 5.597 Primes: 37607912018 $ ./bin/primesieve.pl 1e12 Primes found: 37607912018 Seconds: 5.707 $ ./demos/primes3.gcc 1e12 Primes found: 37607912018 Seconds: 5.696 $ ./demos/primes3.clang 1e12 Primes found: 37607912018 Seconds: 5.767 $ ./demos/primes3.nvc 1e12 Primes found: 37607912018 Seconds: 5.841 $ ./demos/primes4 1e12 Primes found: 37607912018 Seconds: 5.719

Printing Primes

Outputting prime numbers is another story. Workers using MCE output to /dev/shm location in parallel, passing the chunk_id to the manager process to output orderly. This is very fast. The C and Codon demonstrations write directly to STDOUT, orderly. Here, threads wait their turn.

The saddest moment was witnessing OpenMP consume unnecessary power consumption for waiting threads. I created an issue ticket for LLVM OpenMP and NVIDIA HPC OpenMP. IMHO, only GCC OpenMP pass in this regard. This is the reason GCC ran faster compared to CLANG and NVIDIA NVC.

Output size for 1e10 is 4.6 GB. Be sure to direct to a command (i.e. cksum) or /dev/null.

# Algorithm3 $ ./bin/algorithm3.pl 1e10 -p >/dev/null Seconds: 0.743 $ ./demos/primes1.gcc 1e10 -p >/dev/null Seconds: 10.249 $ ./demos/primes1.clang 1e10 -p >/dev/null Seconds: 12.696 $ ./demos/primes1.nvc 1e10 -p >/dev/null Seconds: 14.326 $ ./demos/primes2 1e10 -p >/dev/null Seconds: 12.369 # Primesieve # the primesieve binary uses one core when -p is given $ time /usr/local/bin/primesieve 1e10 -p >/dev/null Seconds: 14.379 $ ./bin/primesieve.pl 1e10 -p >/dev/null Seconds: 0.680 $ ./demos/primes3.gcc 1e10 -p >/dev/null Seconds: 7.145 $ ./demos/primes3.clang 1e10 -p >/dev/null Seconds: 8.826 $ ./demos/primes3.nvc 1e10 -p >/dev/null Seconds: 11.249 $ ./demos/primes4 1e10 -p >/dev/null Seconds: 8.597

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11157187]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others admiring the Monastery: (3)
As of 2025-02-09 15:13 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Which URL do you most often use to access this site?












    Results (96 votes). Check out past polls.