I think https://chrisarg.github.io/Killing-It-with-PERL/2024/07/07/The-Quest-For-Performance-Part-II-PerlVsPython.md.html et al make a powerful argument that Perl (especially with PDL) can be a powerful, and high-performance, adjunct to other programming languages or tools in data science, and other science.
I created complementary demonstrations for the article on the web, running on the CPU or GPU.
CPU results captured on AMD Threadripper 3970X.
$ perl pdl_cpu.pl
==================================================
In place in (Perl PDL - CPU multi-threaded)
==================================================
33974.886 µs
37286.043 µs
36175.966 µs
32786.131 µs
34714.937 µs
32924.175 µs
33201.933 µs
33097.029 µs
33265.114 µs
33904.076 µs
$ python numba_prange.py
==================================================
In place in (Python Numba - CPU multi-threaded)
==================================================
30696.392 µs
37623.167 µs
32857.656 µs
26406.765 µs
26831.150 µs
26624.441 µs
26728.153 µs
26698.828 µs
26748.657 µs
26932.001 µs
$ python taichi_cpu.py
[Taichi] version 1.8.0, llvm 15.0.4, commit 37a05638, linux, python 3.
+10.13
==================================================
In place in (Python Taichi - CPU multi-threaded)
==================================================
[Taichi] Starting on arch=x64
76603.651 µs
45389.175 µs
37540.913 µs
29339.075 µs
28975.725 µs
29037.952 µs
30524.731 µs
29897.928 µs
30224.562 µs
28925.419 µs
GPU results captured on NVIDIA GeForce RTX 3070.
$ python cupy_blas.py
==================================================
In place in (Python CuPy - GPU cuBLAS)
==================================================
6246.805 µs
24.557 µs
17.881 µs
17.643 µs
17.643 µs
17.643 µs
17.643 µs
17.405 µs
17.405 µs
17.405 µs
$ python numba_cuda.py
==================================================
In place in (Python Numba - GPU CUDA)
==================================================
182.629 µs
40.054 µs
30.279 µs
28.133 µs
27.895 µs
27.657 µs
27.418 µs
26.941 µs
26.941 µs
26.703 µs
$ python taichi_cuda.py
[Taichi] version 1.8.0, llvm 15.0.4, commit 37a05638, linux, python 3.
+10.13
==================================================
In place in (Python Taichi - GPU CUDA)
==================================================
[Taichi] Starting on arch=cuda
15263.557 µs
51.975 µs
28.133 µs
25.034 µs
24.080 µs
26.464 µs
26.226 µs
24.557 µs
24.080 µs
28.133 µs