Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

Re^2: Solving the Long List is Long challenge - C++ vs Perl

by marioroy (Prior)
on Jul 14, 2023 at 14:26 UTC ( [id://11153417]=note: print w/replies, xml ) Need Help??


in reply to Re: Solving the Long List is Long challenge - C++ vs Perl
in thread Solving the Long List is Long challenge, finally?

I ran a test run for 1 billion+ lines (312 big files). This was done to compare the various demonstrations "mainly" processing duplicate keys. I find it impressive seeing Tie::Hash::DBD (SQLite) keep up with DB_File. Even more impressive are Tokyo and Kyoto Cabinet. They provide the "addinc" and "increment" API, respectively.

Perl solutions:

Note: All tasks run parallel except for "sort packed data".

$ perl llilsql.pl --threads=48 --maps=max \ in/biga* in/biga* in/biga* in/biga* in/biga* in/biga* \ in/biga* in/biga* in/biga* in/biga* in/biga* in/biga* \ | cksum Tie::Hash::DBD SQLite database - start fixed string length=12, threads=48, maps=128 get properties : 321.814 secs 3.408 mil QPS pack properties : 1.621 secs sort packed data : 6.873 secs write stdout : 1.651 secs total time : 331.978 secs count lines : 1096742400 count unique : 79120065 3625599930 791200650 $ perl llildbf.pl --threads=48 --maps=max \ in/biga* in/biga* in/biga* in/biga* in/biga* in/biga* \ in/biga* in/biga* in/biga* in/biga* in/biga* in/biga* \ | cksum DB_File B-tree database - start fixed string length=12, threads=48, maps=128 get properties : 314.637 secs 3.486 mil QPS pack properties : 2.074 secs sort packed data : 6.690 secs write stdout : 1.677 secs total time : 325.097 secs count lines : 1096742400 count unique : 79120065 3625599930 791200650 $ perl lliltch.pl --threads=48 --maps=max \ in/biga* in/biga* in/biga* in/biga* in/biga* in/biga* \ in/biga* in/biga* in/biga* in/biga* in/biga* in/biga* \ | cksum Tokyo Cabinet hash database - start fixed string length=12, threads=48, maps=128 get properties : 120.170 secs 9.127 mil QPS pack properties : 3.312 secs sort packed data : 6.872 secs write stdout : 1.712 secs total time : 132.086 secs count lines : 1096742400 count unique : 79120065 3625599930 791200650 $ perl llilkch.pl --threads=48 --maps=max \ in/biga* in/biga* in/biga* in/biga* in/biga* in/biga* \ in/biga* in/biga* in/biga* in/biga* in/biga* in/biga* \ | cksum Kyoto Cabinet hash database - start fixed string length=12, threads=48, maps=128 get properties : 139.692 secs 7.851 mil QPS pack properties : 2.476 secs sort packed data : 6.974 secs write stdout : 1.635 secs total time : 150.795 secs count lines : 1096742400 count unique : 79120065 3625599930 791200650

C++ for comparison:

QPS is queries per second for get properties. The right-column are results using April 2024 release.

$ NUM_THREADS=48 ./llil4umap \ in/biga* in/biga* in/biga* in/biga* in/biga* in/biga* \ in/biga* in/biga* in/biga* in/biga* in/biga* in/biga* \ | cksum llil4umap (fixed string length=12) start use OpenMP 73.939 mil QPS 96.757 mil QPS get properties 14.833 secs 11.335 secs map to vector 4.343 secs 1.410 secs vector stable sort 0.471 secs 0.436 secs write stdout 1.413 secs 0.297 secs total time 28.107 secs 13.480 secs count lines 1096742400 count unique 79120065 3625599930 791200650 $ NUM_THREADS=48 ./llil4hmap \ in/biga* in/biga* in/biga* in/biga* in/biga* in/biga* \ in/biga* in/biga* in/biga* in/biga* in/biga* in/biga* \ | cksum llil4hmap (fixed string length=12) start use OpenMP 123.660 mil QPS 142.527 mil QPS get properties 8.869 secs 7.695 secs map to vector 0.235 secs 0.373 secs vector stable sort 0.477 secs 0.413 secs write stdout 1.445 secs 0.315 secs total time 11.028 secs 8.798 secs count lines 1096742400 count unique 79120065 3625599930 791200650 $ NUM_THREADS=48 ./llil4emh \ in/biga* in/biga* in/biga* in/biga* in/biga* in/biga* \ in/biga* in/biga* in/biga* in/biga* in/biga* in/biga* \ | cksum llil4emh (fixed string length=12) start use OpenMP 150.239 mil QPS 176.609 mil QPS get properties 7.300 secs 6.219 secs map to vector 0.186 secs 0.378 secs vector stable sort 0.478 secs 0.439 secs write stdout 1.410 secs 0.305 secs total time 9.458 secs 7.334 secs count lines 1096742400 count unique 79120065 3625599930 791200650

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11153417]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others imbibing at the Monastery: (4)
As of 2025-02-14 15:27 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found