![]() |
|
P is for Practical | |
PerlMonks |
Re^2: [OT] The Long List is Long resurrectedby marioroy (Prior) |
on Apr 08, 2024 at 00:46 UTC ( [id://11158747]=note: print w/replies, xml ) | Need Help?? |
> What I really like is that your code is mostly standard C++ that would appear to just work on just about any modern hardware. Is that right? Replacing std::mutex with spinlock_mutex resolved the issue with nvc++ taking over 40 seconds to build llil4hmap.cc and llil4emh.cc. Though peculiar, I reached out to NVIDIA about it. All is well otherwise. The binary performs similarly to clang++. > Would your C++ code automatically scale when run on a beast GPGPU machine with, say, six high end NVIDIA graphics cards? Regarding llil4map/vec, the first step is replacing the OpenMP directives with the C++17 (or higher), parallel and vector concurrency via execution policies. Get properties may not run on the GPU. But no reason why sorting cannot. However, the GPU may lack sufficient memory capacity for the LLiL challenge. From the artical, "The GPU version also uses the parallel execution policy, but is compiled with nvc++ and the -stdpar compiler option."
Accelerating Standard C++ with GPUs Using stdpar See also C++ Parallel Algorithms. The NVIDIA HPC SDK is currently version 23.9, released recently.
In Section
Meditations
|
|