Rosetta Code: Long List is Long

Replies are listed 'Best First'.
Re: Rosetta Code: Long List is Long by choroba (Cardinal) on Nov 30, 2022 at 23:25 UTC
As usually, if speed is your concern, you can trade it for space. #!/usr/bin/perl use warnings; use strict; use feature qw{ say }; warn "start\n"; my $tstart1 = time; my (%by_count, %by_word); while (<>) { chomp; my ($k, $v) = split /\t/, $_, 2; if (exists $by_word{$k}) { $v += $by_word{$k}; delete $by_count{ $by_word{$k} }{$k}; } undef $by_count{$v}{$k}; $by_word{$k} = $v; } my $tend1 = time; warn "get properties: ", $tend1 - $tstart1, " secs\n"; my $tstart2 = time; for my $count (sort { $b <=> $a } keys %by_count) { say "$_\t$count" for sort keys %{ $by_count{$count} }; } my $tend2 = time; warn "sort + output: ", $tend2 - $tstart2, " secs\n"; warn "total: ", $tend2 - $tstart1, " secs\n"; [download] Comparison? `llil start get_properties : 13 secs sort + output : 85 secs total : 98 secs start get properties: 21 secs sort + output: 25 secs total: 46 secs` [download] The output is identical. `map{substr$_->[0],$_->[1]\|\|0,1}[\\|\|{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^ARGV,3]`	[reply] [d/l] [select]
Re^2: Rosetta Code: Long List is Long by eyepopslikeamosquito (Archbishop) on Dec 01, 2022 at 00:16 UTC
Excellent work choroba! On my machine I see: `get properties: 17 secs sort + output: 18 secs total: 35 secs` [download] more than twice as fast as my original Perl version of 85 secs ... with Windows Private Bytes increasing from 2,236,648K to 3,516,476K.	[reply] [d/l]
Re^2: Rosetta Code: Long List is Long -- dualvar by marioroy (Prior) on Dec 02, 2022 at 04:07 UTC
The following is my fun spin using dualvar, based on choroba.pl. I tried minimizing memory consumption by using one hash and one array. The memory consumption is similar to llil.pl and performance slightly faster than choroba.pl. See also, parallel solution update. #!/usr/bin/perl use warnings; use strict; use feature qw{ say }; use Scalar::Util qw{ dualvar }; warn "start\n"; my $tstart1 = time; our (%by_word, @data); while (<>) { chomp; my ($k, $v) = split /\t/, $_; $by_word{$k} += $v; } my $tend1 = time; warn "get properties: ", $tend1 - $tstart1, " secs\n"; my $tstart2 = time; while (my ($k, $v) = each %by_word) { push @data, dualvar($v, $k); } # output array of dualvars; sorted by number desc, string asc say "$_\t".(0+$_) for sort { $b <=> $a } sort { $a cmp $b } @data; my $tend2 = time; warn "sort + output: ", $tend2 - $tstart2, " secs\n"; warn "total: ", $tend2 - $tstart1, " secs\n"; [download]	[reply] [d/l]
Re^3: Rosetta Code: Long List is Long -- dualvar by choroba (Cardinal) on Dec 02, 2022 at 11:20 UTC
Interesting! What really surprised me was how slow it became once I tried to sort the data just once: `sort { $b <=> $a \|\| $a cmp $b } @data;` [download] Before: `sort + output: 19 secs` [download] After: `sort + output: 32 secs` [download] `map{substr$_->[0],$_->[1]\|\|0,1}[\\|\|{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^ARGV,3]`	[reply] [d/l] [select]
Re^4: Rosetta Code: Long List is Long -- dualvar by eyepopslikeamosquito (Archbishop) on Dec 04, 2022 at 06:35 UTC
Re^2: Rosetta Code: Long List is Long by Anonymous Monk on Dec 01, 2022 at 01:53 UTC
On macOS, exiting the script requires 15 additional seconds. This is a great test case for comparing "my" vs "our". `my (%by_count, %by_word); # slow cleanup/exiting $ time perl choroba.pl big1.txt big2.txt big3.txt >cpp.tmp start get properties: 21 secs sort + output: 25 secs total: 46 secs real 1m2.072s user 1m0.182s sys 0m1.885s` [download] `our (%by_count, %by_word); # fast cleanup/exiting $ time perl choroba.pl big1.txt big2.txt big3.txt >cpp.tmp start get properties: 21 secs sort + output: 25 secs total: 46 secs real 0m47.062s user 0m45.505s sys 0m1.549s` [download]	[reply] [d/l] [select]
Re^3: Rosetta Code: Long List is Long by choroba (Cardinal) on Dec 01, 2022 at 11:02 UTC
What Perl version do you run? The final garbage collecting takes about 6 secs on my Linux machine in both 5.26.1 and blead. `map{substr$_->[0],$_->[1]\|\|0,1}[\\|\|{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^ARGV,3]`	[reply] [d/l]
Re^4: Rosetta Code: Long List is Long by Anonymous Monk on Dec 01, 2022 at 16:17 UTC
Re: Rosetta Code: Long List is Long -- parallel by marioroy (Prior) on Dec 01, 2022 at 06:37 UTC
Greetings, eyepopslikeamosquito > Please feel free to respond away with solutions... I made a parallel demonstration. The results are taken from a 32-core Linux box. On Windows, run Cygwin's Perl for best performance. Results # Modification: our (%by_count, %by_word); $ time perl choroba_mod.pl big1.txt big2.txt big3.txt >oo1.txt start get properties: 9 secs sort + output: 14 secs total: 23 secs real 0m 23.083s user 0m 22.568s sys 0m 0.491s # Run parallel on physical cores: max_workers => 4 $ time taskset -c 0-31 perl mce.pl big1.txt big2.txt big3.txt >oo2.txt start get properties + pre-sort: 14 secs final sort + output: 2 secs total: 16 secs real 0m 15.434s user 0m 52.223s sys 0m 0.824s # Verify correctness: $ diff cpp.tmp oo1.txt # choroba $ diff cpp.tmp oo2.txt # MCE 4 workers [download] Parallel code Basically, workers process and gather orderly letters "a" through "z". I was hoping to gather an array of dualvars, but forgotten serialization removes the numeric part. Read more... (2 kB)	[reply] [d/l] [select]
Re: Rosetta Code: Long List is Long by Tux (Canon) on Dec 01, 2022 at 09:18 UTC
Not a real contestant, but a different approach using TSV/XS (just because I could not resist): `$ cat -te tt1.txt camel^I42$ pearl^I94$ dromedary^I69$ $ cat -te tt2.txt camel^I8$ hello^I12345$ dromedary^I1$ $ perl -MText::CSV_XS=csv -wE'csv(in=>*ARGV,sep=>"\t",out=>undef,on_in +=>sub{$x{$_[1][0]}+=$_[1][1]});printf"%-20s %6d\n",$_,$x{$_}for sort +keys%x' tt?.t camel 50 dromedary 70 hello 12345 pearl 94` [download] Enjoy, Have FUN! H.Merijn	[reply] [d/l]
Re: Rosetta Code: Long List is Long -- oneliner by Discipulus (Canon) on Dec 01, 2022 at 08:00 UTC
Hello eyepopslikeamosquito, > So please feel free to respond away in your favourite language! If oneliner is a language.. `cat lill01.txt camel 42 pearl 94 dromedary 69 cat lill02.txt camel 8 hello 12345 dromedary 1 perl 94 perl -lane "$r{$F[0]}+=$F[1]}{print qq($_\t$r{$_})for sort{$r{$b}<=>$r +{$a}\|\|$a cmp $b}keys %r" lill01.txt lill02.txt # or shortened by one char because perl -a implies -n perl -lae "$r{$F[0]}+=$F[1]}{print qq($_\t$r{$_})for sort{$r{$b}<=>$r{ +$a}\|\|$a cmp $b}keys %r" lill01.txt lill02.txt hello 12345 pearl 94 perl 94 dromedary 70 camel 50` [download] Deparsed for newcomers.. Read more... (1435 Bytes) L* There are no rules, there are no thumbs.. Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.	[reply] [d/l] [select]
Re: Rosetta Code: Long List is Long - GNU Parallel by marioroy (Prior) on Feb 09, 2023 at 09:05 UTC
The SQLite demonstrations were interesting. This is about GNU Parallel, which includes parsort. Something like this is possible: Linux box: Note: See this thread for generating the big input files. `LC_ALL=C sort -k1 big{1,2,3}.txt \| awk -f tally-count.awk \| LC_ALL=C s +ort -k2nr >out.txt real time: 12.671 seconds LC_ALL=C parsort -k1 big{1,2,3}.txt \| awk -f tally-count.awk \| LC_ALL= +C parsort -k2nr >out.txt real time: 6.561 seconds LC_ALL=C parsort -k1 big{1,2,3}.txt \| ./tally-count \| LC_ALL=C parsort + -k2nr >out.txt real time: 3.615 seconds` [download] Bigger box: `LC_ALL=C parsort -k1 big{1,2,3}.txt \| awk -f tally-count.awk \| LC_ALL= +C parsort -k2nr >out.txt real time: 6.520 seconds LC_ALL=C parsort -k1 big{1,2,3}.txt \| ./tally-count \| LC_ALL=C parsort + -k2nr >out.txt real time: 2.398 seconds` [download] The big files are two column key-value pairs delimited by a tab. The output from the first sort command contains duplicate key names. Next, the tally-count command sums adjacent count fields of duplicate names. Its output contains unique names. Finally, parsort is called again to sort by sum in descending order and keyname in ascending order. q - Text as Data: I came across q - Run SQL directly on delimited files and multi-file sqlite databases on GitHub. Specify -tT for tab-delimited input and output. The dash indicates to read from standard input. The q tool is interesting. `cat big{1,2,3}.txt \| ./linux-q -tT "select c1, sum(c2) from - group by + c1 order by sum(c2) desc, c1" >out.txt real time: 84.027 seconds ./linux-q -tT "select c1, sum(c2) from big[1-3].txt group by c1 order +by sum(c2) desc, c1" >out.txt real time: 89.539 seconds` [download] tally-count.awk: The Awk script becomes a bottleneck for parallel sort. Notice above, no improvement on a bigger machine. Read more... (534 Bytes) tally-count.cpp: The C++ solution is noticeably faster. The great thing about the parsort/tally-count combination is being able to process hundreds of list files and not worry about exhausting memory capacity. Read more... (4 kB) See also Grouped aggregate utility (like SQL GROUP BY)? on Stack Exchange. GNU Parallel Citation Tange, O. (2023, January 22). GNU Parallel 20230122 ('Bolsonaristas'). Zenodo. https://doi.org/10.5281/zenodo.7558957	[reply] [d/l] [select]
Re^2: Rosetta Code: Long List is Long - GNU Parallel by marioroy (Prior) on Feb 09, 2023 at 09:59 UTC
At the time of testing, I was hoping to specify the number of threads using parsort but was unable to in a consistent fashion. So, I created a wrapper script named "parallel" that is placed in a path (/usr/local/bin) before (/usr/bin) where "parallel" itself resides. #!/usr/bin/env bash # Wrapper script for parallel. # Whoa!!! GNU Parallel assumes you want to consume all CPU cores. # Unfortunately, one cannot specify the number of threads for parsort. CMD="/usr/bin/parallel" if [[ -z "$PARALLEL_NUM_THREADS" ]]; then exec "$CMD" "$@" elif [[ "$#" -eq 1 && "$1" == "--number-of-threads" ]]; then echo $PARALLEL_NUM_THREADS; exit 0 elif [[ "$1" == "-j" ]]; then shift; shift; exec "$CMD" -j $PARALLEL_NUM_THREADS "$@" else exec "$CMD" -j $PARALLEL_NUM_THREADS "$@" fi [download] The environment variable prevents parsort (/usr/bin/parallel, behind the scene) from going semi-wild on a big machine with many CPU cores. `export PARALLEL_NUM_THREADS=12 LC_ALL=C parsort -k1 big{1,2,3}.txt \| ./tally-count \| LC_ALL=C parsort + -k2nr >out.txt` [download] The GNU Parallel "parsort" / "tally-count" combination may be useful for Chuma in the originating Long list is long thread. Chuma wrote, "the files are pretty large (up to a couple hundred MB), there are quite a few of them (2064)."	[reply] [d/l] [select]
Re: Rosetta Code: Long List is Long - SQLite code by marioroy (Prior) on Feb 08, 2023 at 02:17 UTC
I tried a key-value store implementation using SQLite. There are two scripts; one for importing files and another to output the results. Note: See this thread for generating the big input files. `$ wc -l big1.txt big2.txt big3.txt 3515200 big1.txt 3515200 big2.txt 3515200 big3.txt 10545600 total $ time bash llil_sql_imp1 hash1.db big1.txt big2.txt big3.txt Importing big1.txt Importing big2.txt Importing big3.txt real 0m12.913s user 0m12.701s sys 0m0.197s $ time bash llil_sql_qry1 hash1.db >out.txt real 0m6.297s user 0m6.157s sys 0m0.158s $ ls -lh hash1.db -rw-r--r-- 1 mario mario 330M Feb 7 22:55 hash1.db` [download] Updated on 2023-02-08 llil_sql_imp1: The optional -reset argument to the importer script removes the DB file. The .output to /dev/null is to silence SQLite when setting PRAGMA statements. The .output without an argument restores it back to standard output. Read more... (4 kB) llil_sql_qry1: This is the complementary script to output the data with proper formatting and sort order. Read more... (2 kB)	[reply] [d/l] [select]
Re^2: Rosetta Code: Long List is Long - SQLite code by marioroy (Prior) on Feb 08, 2023 at 05:13 UTC
If the requirement calls for a one-time report, then I tried another implementation using SQLite; without a view, trigger, and primary key. A group by clause is used for the select statement, now that the DB has duplicate keys. The total time sheds 5 seconds including smaller DB size. Note: Variant two is not compatible with variant one. running: `$ wc -l big1.txt big2.txt big3.txt 3515200 big1.txt 3515200 big2.txt 3515200 big3.txt 10545600 total $ time bash llil_sql_imp2 hash2.db big1.txt big2.txt big3.txt Importing big1.txt Importing big2.txt Importing big3.txt real 0m4.350s user 0m4.298s sys 0m0.049s $ time bash llil_sql_qry2 hash2.db >out.txt real 0m9.668s user 0m9.537s sys 0m0.122s $ ls -lh hash2.db -rw-r--r-- 1 mario mario 160M Feb 7 22:57 hash2.db` [download] q - Text as Data: I came across q - Run SQL directly on delimited files and multi-file sqlite databases on GitHub. Specify -T for tab-delimited output. You may like this tool. Check it out. `$ time ./linux-q -T "select name, sum(value) from hash.db:::kv_store g +roup by name order by sum(value) desc, name" >out.txt real 0m49.136s user 0m48.667s sys 0m0.423s` [download] llil_sql_imp2: Like variant one, the optional -reset argument to the importer script removes the DB file. The .output to /dev/null is to silence SQLite when setting PRAGMA statements. The .output without an argument restores it back to standard output. Read more... (2 kB) llil_sql_qry2: This produces output with proper formatting and sort order. Read more... (1500 Bytes)	[reply] [d/l] [select]
Re^3: Rosetta Code: Long List is Long - SQLite code by marioroy (Prior) on Feb 08, 2023 at 23:14 UTC
Part three :) I tried creating a single SQLite DB file per each list file. The query script then reads all DBs by attaching other DBs to the first DB. Something like this. `ATTACH DATABASE 'hash2.db' AS ha_2; ATTACH DATABASE 'hash3.db' AS ha_3; SELECT name, SUM(value) FROM ( SELECT * FROM main.kv_store UNION ALL SELECT * FROM ha_2.kv_store UNION ALL SELECT * FROM ha_3.kv_store ) GROUP BY name ORDER BY SUM(value) DESC, name; DETACH DATABASE ha_2; DETACH DATABASE ha_3;` [download] running: Each import takes ~ 1.5 seconds on my machine. The query takes the same amount of time as before. `$ bash llil_sql_imp2 hash1.db big1.txt $ bash llil_sql_imp2 hash2.db big3.txt $ bash llil_sql_imp2 hash3.db big3.txt $ time bash llil_sql_qryn hash1.db hash2.db hash3.db >out.txt real 0m9.650s user 0m9.534s sys 0m0.106s` [download] llil_sql_qryn: Read more... (4 kB)	[reply] [d/l] [select]
Re: Rosetta Code: Long List is Long -- Python by parv (Parson) on Dec 02, 2022 at 10:26 UTC
Environment: FreeBSD stable/13 (amd64) as guest in VirtualBox 6.1.30 on Windows 10 host; 2 CPUs & ~8 GB of RAM are allocated to the VM on ThinkPad X260 (Intel i5-6500U CPU, 16 GB RAM). `perl` is 5.32.1; `python` is 3.10.8. I may attempt at a Rust, which is I am currently learning, version. Your Perl version takes ~7-8 minute (time to collect data is same enough as my Python version). Mind that time values are of just one run each. `llil start get_properties : 44 secs sort + output : 429 secs total : 473 secs` [download] From choroba's attempt ... `start get properties: 60 secs sort + output: 63 secs total: 123 secs` [download] From my Python attempt (generally takes ~3-4 minute) ... `# Manually rounded off after posting. collect time : 37.6 s sort_via_cmp_to_key time : 115.1 s sort+format time : 115.1 s total processing time : 152.7 s output time : 22.0 s total time : 174.7 s` [download] ... most of time savings (of ~10-20 s) over my earlier Python attempts came from ... printing the output as one string instead of printing each line in a loop; sorting the dict via keys instead of sorting an iterator of tuples created from the "dict"; and using subtraction to compare the numbers instead of explicitly returning one of -1, 0, 1 based on `[><=]` comparison. Later I realized there was one extraneous loop; removed now, but the patch keeps the history. Read more... Python 3 attempt; update 1 (6 kB) ... said patch (this filler text is to better separate from preceding code portion) ... Read more... patch removes one extra loop, aligns the timing text (3 kB)	[reply] [d/l] [select]
Re^2: Rosetta Code: Long List is Long -- Python by marioroy (Prior) on Dec 09, 2022 at 12:01 UTC
This is neat, parv. I compared Python 3.10.7 against Pyston-lite and Pyston. Python: `$ python3 llil.py big1.txt big2.txt big3.txt > out1.txt collect time : 5.2 s sort_via_cmp_to_key time: 17.7 s sort+format time : 17.7 s total processing time : 22.9 s output time : 3.5 s total time : 26.4 s` [download] Python with Pyston-lite loaded automatically: `# Install pyston_lite_autoload to ~/.local/lib/python3.10/site-package +s/. $ python3 -m pip install --user pyston_lite_autoload $ python3 llil.py big1.txt big2.txt big3.txt > out2.txt collect time : 4.4 s sort_via_cmp_to_key time: 15.8 s sort+format time : 15.8 s total processing time : 20.2 s output time : 3.2 s total time : 23.4 s` [download] Pyston 2.3.5: I also tried Pyston 2.3.5. But first, I needed to modify the function declarations to resolve "TypeError: 'type' object is not subscriptable. `$ diff llil.py llil2.py 53c53 < def collect( data_list :list ) ->dict[ str, int ]: --- > def collect( data_list :list ) ->dict: 87c87 < def process( cat_count :dict[ str, int ] ) ->Generator[ str, None, N +one ]: --- > def process( cat_count :dict ) ->Generator: 110c110 < def sort_via_cmp_to_key( cat_count :dict[ str, int ] ) ->list[ str ] +: --- > def sort_via_cmp_to_key( cat_count :dict ) ->list:` [download] `$ ./pyston_2.3.5/bin/pyston3 llil2.py big1.txt big2.txt big3.txt > out +3.txt collect time : 3.7 s sort_via_cmp_to_key time: 12.2 s sort+format time : 12.2 s total processing time : 15.9 s output time : 3.0 s total time : 18.8 s` [download] Look at Pyston go; taking Python performance to a new level :) How does Python/Pyston compare to Perl? I ran the dualvar demonstration for comparison. `$ perl dualvar.pl big1.txt big2.txt big3.txt >out4.txt start get properties: 6 secs sort + output: 16 secs total: 22 secs` [download] The Perl code consumes around 2340 MB versus 1790 MB for Python. See also: Python 3.12 Goals Python is about to get faster The Pyston Blog Numba	[reply] [d/l] [select]
Re^3: Rosetta Code: Long List is Long -- Python by parv (Parson) on Dec 10, 2022 at 04:51 UTC
In short, could you do another Python, Pyston* run with sort function replaced with ... `def sort_native( cat_count ): once = sorted( cat_count.keys() ) return sorted( once, key = lambda k: cat_count[ k ], reverse = True + )` [download] ...? In long, before eyepopslikeamosquito posted ... `sort { $href->{$b} <=> $href->{$a} } sort keys %{$href}` [download] ... I had missed to notice the sorting order. I decided to do the same for Python (native) version instead of using `functools.cmp_to_key` function. I also realized that I was doing the sorting other way (sorting keys by value, followed by sorting by key), so was not getting the expected output (which made me to use `cmp_to_key` instead). Replacing `sort_via_cmp_to_key` with ... def sort_native( cat_count :dict[ str, int ] ) ->list[ str ]: """ Returns: A `list` of sorted keys by decreasing order of values & increasi +ng order of keys. Args: cat_count: A `dict` with string key & integer value. """ once = sorted( cat_count.keys() ) return sorted( once, key = lambda k: cat_count[ k ], reverse = True + ) [download] ... reduces the sort time by ~10 times (~11-16 s; also produces the expected output) in my run environment. time passes, so slowly🎶 ... Putting the complete program here (~28-35 s) ... Read more... Python version, updated with native sort (6 kB)	[reply] [d/l] [select]
Re^4: Rosetta Code: Long List is Long -- Python by marioroy (Prior) on Dec 10, 2022 at 07:05 UTC
Re^5: Rosetta Code: Long List is Long -- Python by parv (Parson) on Dec 10, 2022 at 07:17 UTC
Re: Rosetta Code: Long List is Long (Updated Solutions) by eyepopslikeamosquito (Archbishop) on Dec 05, 2022 at 09:32 UTC
As you might expect, I've created improved versions of my original Perl and C++ solutions posted in the root node. Since I'm all out of fresh ideas to try, I'll post my improved solutions here. Further performance improvement suggestions are welcome. Improved Perl Version llil2.pl # llil2.pl # Example run: perl llil2.pl tt1.txt tt2.txt tt3.txt >out.txt use strict; use warnings; # -------------------------------------------------------------------- +-- # LLiL specification # ------------------ # A LLiL-format file is a text file. # Each line consists of a lowercase name a TAB character and a non-neg +ative integer count. # That is, each line must match : ^[a-z]+\t\d+$ # For example, reading the LLiL-format files, tt1.txt containing: # camel\t42 # pearl\t94 # dromedary\t69 # and tt2.txt containing: # camel\t8 # hello\t12345 # dromedary\t1 # returns this hashref: # $hash_ret{"camel"} = 50 # $hash_ret{"dromedary"} = 70 # $hash_ret{"hello"} = 12345 # $hash_ret{"pearl"} = 94 # That is, values are added for items with the same key. # # To get the required LLiL text, you must sort the returned hashref # descending by value and insert a TAB separator: # hello\t12345 # pearl\t94 # dromedary\t70 # camel\t50 # To make testing via diff easier, we further sort ascending by name # for lines with the same value. # -------------------------------------------------------------------- +-- # Function get_properties # Read a list of LLiL-format files # Return a reference to a hash of properties sub get_properties { my $files = shift; # in: reference to a list of LLiL-format fil +es my %hash_ret; # out: reference to a hash of properties for my $fname ( @{$files} ) { open( my $fh, '<', $fname ) or die "error: open '$fname': $!"; while (<$fh>) { chomp; my ($word, $count) = split /\t/; $hash_ret{$word} += $count; } close($fh) or die "error: close '$fname': $!"; } return \%hash_ret; } # ----------------- mainline ----------------------------------------- +-- @ARGV or die "usage: $0 file...\n"; my @llil_files = @ARGV; warn "llil2 start\n"; my $tstart1 = time; my $href = get_properties( \@llil_files ); my $tend1 = time; my $taken1 = $tend1 - $tstart1; warn "get_properties : $taken1 secs\n"; my $tstart2 = time; # Using two sorts is waaay faster than one in Perl for some reason! (s +ee [id://11148545]) for my $key ( sort { $href->{$b} <=> $href->{$a} } sort keys %{$href} +) { print "$key\t$href->{$key}\n"; } my $tend2 = time; my $taken2 = $tend2 - $tstart2; my $taken = $tend2 - $tstart1; warn "sort + output : $taken2 secs\n"; warn "total : $taken secs\n"; [download] Improved C++ Version llil2.cpp // llil2.cpp. C++ 11 version of Perl llil.pl. // llil2.cpp is faster than llil.cpp while also clarifying limits: // - all keys should be less than 200 or so characters in length // - numbers are 64 bit integers (max: 9,223,372,036,854,775,807) // g++ compile on Linux: // g++ -o llil2 -std=c++11 -Wall -O3 llil2.cpp // This g++ command also works with mingw C++ compiler (https://source +forge.net/projects/mingw-w64) // that comes bundled with Strawberry Perl (C:\Strawberry\c\bin\g++.ex +e). // Example run: llil2 tt1.txt tt2.txt tt3.txt >out.txt #include <cstddef> #include <cstdint> #include <cstdlib> #include <cstring> #include <ctime> #include <cstdio> #include <string> #include <vector> #include <set> #include <map> #include <algorithm> #include <utility> #include <iostream> #include <fstream> #include <sstream> static_assert(sizeof(size_t) == sizeof(int64_t), "size_t too small, ne +ed a 64-bit compile"); // ------------------------------------------------------------------- +--------- // Crude hack to see Windows Private Bytes in Task Manager by sleeping + at // program end (see also sleep hack at end of main) // #include <chrono> // #include <thread> // For some performance hacks to speed up C++ I/O see: // https://www.reddit.com/r/rust/comments/9xedap/how_to_achieve_fast +_stdinstdout_io_suitable_for/ // The only one we use here is to prefer "\n" to std::endl to reduce s +tdout flushing // ------------------------------------------------------------------- +--------- typedef long long llil_int_type; using str_int_type = std::pair<std::string, llil_int_type>; using map_str_int_type = std::map<std::string, llil_int_type>; using vec_str_int_type = std::vector<str_int_type>; // Mimic the Perl get_properties subroutine -------------------------- +-- // Limit line length and use lower level ANSI C functions to try to bo +ost I/O performance // TODO (maybe): // - reading: Try ::setvbuf(fh, NULL, _IOFBF, 65536) or some such on + input files // - writing: Try ::setvbuf(stdout, stdout_buf, _IOFBF, sizeof(stdou +t_buf)) on stdout // ... or instead of writing to stdout, take an output fi +le as a program argument #define MAX_LINE_LEN_L 255 static void get_properties( int nfiles, // in: the number of input files char* fname[], // in: the input file names map_str_int_type& hash_ret) // out: a hash of properties { FILE* fh; char line[MAX_LINE_LEN_L+1]; char* word; char* count; for (int i = 0; i < nfiles; ++i) { fh = ::fopen(fname[i], "r"); if (fh == NULL) { std::cerr << "Error opening '" << fname[i] << "'\n"; return; } while ( ::fgets(line, MAX_LINE_LEN_L, fh) != NULL ) { word = ::strtok(line, "\t"); count = ::strtok(NULL, "\n"); hash_ret[word] += ::atoll(count); } ::fclose(fh); } } // ------------------------------------------------------------------- +-- int main(int argc, char* argv[]) { if (argc < 2) { std::cerr << "usage: llil2 file1 file2 ... >out.txt\n"; return 1; } std::cerr << "llil2 start\n"; time_t tstart1 = ::time(NULL); // Create the hash of properties map_str_int_type hash_ret; get_properties(argc - 1, &argv[1], hash_ret); time_t tend1 = ::time(NULL); long taken1 = static_cast<long>(::difftime(tend1, tstart1) + 0.5); std::cerr << "get_properties : " << taken1 << " secs\n"; // Sort descending by value, i.e. mimic this Perl code in C++: // sort { $href->{$b} <=> $href->{$a} \|\| $a cmp $b } keys %{$href +} time_t tstart2 = ::time(NULL); vec_str_int_type v( hash_ret.begin(), hash_ret.end() ); std::sort( v.begin(), v.end(), [](const str_int_type& left, const str_int_type& right) { return + right.second != left.second ? right.second < left.second : left.firs +t < right.first; } ); // Output the merged properties for ( auto const& n : v ) { std::cout << n.first << '\t' << n.secon +d << '\n'; } time_t tend2 = ::time(NULL); long taken2 = static_cast<long>(::difftime(tend2, tstart2) + 0.5); long taken = static_cast<long>(::difftime(tend2, tstart1) + 0.5); std::cerr << "sort + output : " << taken2 << " secs\n"; std::cerr << "total : " << taken << " secs\n"; // Hack to see Private Bytes in Windows Task Manager (uncomment nex +t line so process doesn't exit too quickly) // std::this_thread::sleep_for(std::chrono::milliseconds(90000000 +)); return 0; } [download] Performance Analysis Performance of my second Perl version improved from: `get_properties : 11 secs sort + output : 74 secs total : 85 secs` [download] to: `get_properties : 11 secs sort + output : 25 secs total : 36 secs` [download] update (much later in llil2grt.pl) to: `get_properties : 10 secs sort + output : 20 secs total : 30 secs` [download] Performance of my second C++ version improved from: `get_properties : 9 secs sort + output : 7 secs total : 16 secs` [download] to: `get_properties : 6 secs sort + output : 6 secs total : 12 secs` [download] Memory use (Windows Private Bytes) was 2,896,104K for the Perl version (update: 2,657,968K), 1,176,048K for the C++ `std::unordered_map` version (update: 1,218,720K for `std::map`). Update: Surprisingly, making a one line change in `llil2.cpp` above from: `using map_str_int_type = std::unordered_map<std::string, llil_int_type +>; // to (llil2a.cpp): using map_str_int_type = std::map<std::string, llil_int_type>;` [download] resulted in a significant speed improvement in `llil2a` (with similar Windows Private Bytes): `get_properties : 4 secs sort + output : 5 secs total : 9 secs` [download] My second Perl version improved from 85 seconds to 36 seconds (update: much later to 30 seconds in llil2grt). Incredibly, this spectacular performance improvement is entirely due to a very surprising one line change from: `sort { $href->{$b} <=> $href->{$a} \|\| $a cmp $b } keys %{$href}` [download] to: `sort { $href->{$b} <=> $href->{$a} } sort keys %{$href} )` [download] When more is known of the reason for this incredible difference, I'll update this node. In contrast, I had to work a lot harder to improve the performance of my C++ version, switching in desperation to some ugly lower level ANSI C functions in its `get_properties` function. For cheap thrills, I also switched from a 32-bit to a 64-bit integer counter `llil_int_type`. I suspect future minor performance tweaks may come from further improving I/O (e.g. by fiddling with file buffering and buffer sizes). As described here this is by far Perl's best performance so far in my three serious Rosetta nodes: For the simple GoL algorithm, C++ was 12.5 times faster, memory use was 2.8 times less. For the complex GoL algorithm, C++ was 212.5 times faster; memory use was 10.1 times less. For the Long List is Long algorithm (this node), C++ was 3 times faster; memory use 2.2 times less. I suspect this is because Long List is Long seems to be mostly I/O bound, while the GoL algorithms were mostly CPU bound. Updated: Changed `std::unordered_map` to `std::map` in `llil2.cpp` above.	[reply] [d/l] [select]
Re^2: Rosetta Code: Long List is Long (Updated Solutions) by Tux (Canon) on Dec 05, 2022 at 15:33 UTC
Some wiser monk might correct me, but I think this difference is because "simple" blocks can be internally optimized (edited: typoes corrected after reading the replies) `my @sorted = sort @unsorted; my @sorted = sort { $a <=> $b } @unsorted; my @sorted = sort { $a cmp $b } @unsorted;` [download] Are all internally optimized to something simple and efficient. Once the block gets more complicated, like having multiple statements and/or multiple expressions, every block has to be wrapped in scopes for each call to that sub. This is one of the reasons why the ~~Larry-Rossler~~ Guttman-Rosler Transform is so efficient. `my @sorted = sort { $a->{id} <=> $b->{id} \|\| $a->{foo} cmp $b->{foo} } + @unsorted; # SLOW my @sorted = map { $_->[1] } sort { $a->[0] cmp $b->[0] } map { pack " +l>A*", $_->{id}, $_->{foo} } @unsorted; # FAST` [download] Enjoy, Have FUN! H.Merijn	[reply] [d/l] [select]
Re^3: Rosetta Code: Long List is Long (Updated Solutions) by jwkrahn (Abbot) on Dec 09, 2022 at 20:53 UTC
This is one of the reasons why the Larry-Rossler is so efficient. I think that you are thinking of the Guttman-Rosler Transform? `my @sorted = sort { $b->{id} <=> $b->{id} \|\| $a->{foo} cmp $b->{foo} } + @unsorted; # SLOW` [download] Comparing `$b->{id}` to itself will not work very well.	[reply] [d/l]
Re^4: Rosetta Code: Long List is Long (Updated Solutions) by eyepopslikeamosquito (Archbishop) on Dec 09, 2022 at 21:17 UTC
Re^5: Rosetta Code: Long List is Long (Updated Solutions) by jwkrahn (Abbot) on Dec 09, 2022 at 22:47 UTC
Some notes below your chosen depth have not been shown here
Re^2: Rosetta Code: Long List is Long (Updated Solutions) by marioroy (Prior) on Dec 05, 2022 at 18:04 UTC
You may have missed the dualvar solution. Sorting an array of dualvars involves zero hash lookups. `use Scalar::Util 'dualvar'; our @data; while (my ($k, $v) = each %{$href}) { push @data, dualvar($v, $k); } # output array of dualvars; sorted by number, string print "$_\t".(0+$_)."\n" for sort { $b <=> $a } sort @data;` [download] `# llil2.pl sort + output : 19 secs # llil3.pl sort + output : 16 secs -- dualvar` [download]	[reply] [d/l] [select]
Re^3: Rosetta Code: Long List is Long (Updated Solutions - dualvar) by eyepopslikeamosquito (Archbishop) on Dec 05, 2022 at 22:28 UTC
I noticed it, but (wrongly) assumed applying its remarkable two sort trick to my original solution would have the same effect. For completeness, I created `llil2d.pl` (shown below) by applying your dualvar array trick to my original `llil2.pl` two-sort solution, with minimal changes. I can confirm that it is indeed about 3 seconds faster and with slightly lower memory use. Despite using Perl for 20 years, I'd never heard of `dualvar` before (update: oops, turns out I had :). Huge kudos to marioroy for unearthing this! `llil2d start get_properties : 11 secs sort + output : 22 secs total : 33 secs Memory use (Windows Private Bytes): 2,824,184K (slightly lower than 2,896,104K for llil2.pl)` [download] For completeness, here is my adjusted `llil2d.pl`: # llil2d.pl. Remarkable dualvar version based on [marioroy]'s concocti +on. # Example run: perl llil2d.pl tt1.txt tt2.txt tt3.txt >out.txt use strict; use warnings; use feature qw{say}; use Scalar::Util qw{dualvar}; # -------------------------------------------------------------------- +-- # LLiL specification # ------------------ # A LLiL-format file is a text file. # Each line consists of a lowercase name a TAB character and a non-neg +ative integer count. # That is, each line must match : ^[a-z]+\t\d+$ # For example, reading the LLiL-format files, tt1.txt containing: # camel\t42 # pearl\t94 # dromedary\t69 # and tt2.txt containing: # camel\t8 # hello\t12345 # dromedary\t1 # returns this hashref: # $hash_ret{"camel"} = 50 # $hash_ret{"dromedary"} = 70 # $hash_ret{"hello"} = 12345 # $hash_ret{"pearl"} = 94 # That is, values are added for items with the same key. # # To get the required LLiL text, you must sort the returned hashref # descending by value and insert a TAB separator: # hello\t12345 # pearl\t94 # dromedary\t70 # camel\t50 # To make testing via diff easier, we further sort ascending by name # for lines with the same value. # -------------------------------------------------------------------- +-- # Function get_properties # Read a list of LLiL-format files # Return a reference to a hash of properties sub get_properties { my $files = shift; # in: reference to a list of LLiL-format fil +es my %hash_ret; # out: reference to a hash of properties for my $fname ( @{$files} ) { open( my $fh, '<', $fname ) or die "error: open '$fname': $!"; while (<$fh>) { chomp; my ($word, $count) = split /\t/; $hash_ret{$word} += $count; } close($fh) or die "error: close '$fname': $!"; } return \%hash_ret; } # ----------------- mainline ----------------------------------------- +-- @ARGV or die "usage: $0 file...\n"; my @llil_files = @ARGV; warn "llil2d start\n"; my $tstart1 = time; my $href = get_properties( \@llil_files ); my $tend1 = time; my $taken1 = $tend1 - $tstart1; warn "get_properties : $taken1 secs\n"; my $tstart2 = time; my @data; while ( my ($k, $v) = each %{$href} ) { push @data, dualvar($v, $k) } # Using two sorts is waaay faster than one! (see [id://11148545]) for my $key ( sort { $b <=> $a } sort @data ) { say "$key\t" . (0 + $key); } my $tend2 = time; my $taken2 = $tend2 - $tstart2; my $taken = $tend2 - $tstart1; warn "sort + output : $taken2 secs\n"; warn "total : $taken secs\n"; [download] Update: llil2grt.pl is about three seconds faster than `llil2d.pl` above, while using slightly less memory. References Added Later Dualvar: Schizophrenic var by bliako ListUtil.xs (including dualvar source code) Create your own dualvars by brian_d_foy at The Effective Perler Some ideas to try in the future: salva super-fast CPAN Sort modules: Sort::Packed, Sort::Key, Sort::Key::Radix, Sort::Key::Multi Judy arrays (e.g. see Re^3: Unsigned 64-bit integer as Judy key and Re^2: Rosetta Code: Long List is Long) Simple Perl versions using built-in Perl sort idioms especially GRT (this is now done) See also: Re^5: Create sort function from a text file (Sorting References: Schwartzian, GRT, Orcish, External, Parallel)	[reply] [d/l] [select]
Re^2: Rosetta Code: Long List is Long (Updated C++ Solutions) by eyepopslikeamosquito (Archbishop) on Dec 19, 2022 at 05:45 UTC
After being surprised in the extreme by marioroy's astonishing two-sort dualvar Perl solution and my accidental jwkrahn-inspired llil2grt.pl Perl GRT solution, I couldn't restrain myself from trying to steal ideas from these two nodes to improve the (9 second) speed of my fastest llil2a C++ solution. llil2m.cpp Here is my C++ solution inspired by marioroy's famous two-sort Perl concoction. Read more... (9 kB) The above program can be compiled with or without `MARIOROY_TWO_SORT_TRICK_L` defined. Without `MARIOROY_TWO_SORT_TRICK_L` defined I saw: `> llil2m big1.txt big2.txt big3.txt >llil2m.tmp llil2m start get_properties CPU time : 4.187 secs vector copy CPU time : 0.612 secs vector sort CPU time : 2.89 secs write stdout CPU time : 1.383 secs total CPU time : 9.074 secs total wall clock : 9 secs` [download] So far so good. This is just the fastest C++ solution found previously. Beside myself with excitement, I defined `MARIOROY_TWO_SORT_TRICK_L` ... but sobbed when I saw: `> llil2m big1.txt big2.txt big3.txt >llil2m.tmp llil2m start get_properties CPU time : 4.286 secs vector copy CPU time : 0.613 secs vector stable sort CPU time : 2.977 secs write stdout CPU time : 1.363 secs total CPU time : 9.244 secs total wall clock : 9 secs Memory use (Windows Private Bytes): 1,218,716K` [download] No improvement. At all. Aaaargh! When I changed `stable_sort` to `sort` the sort speed improved dramatically from `2.977 secs` to just `0.750 secs` ... but of course the output was wrong. It seems there is a high performance price to pay for a stable sort in this case. Oh well, at least we now have an alternative nine second solution in C++ using a stable sort. Sorting Algorithms BTW, from sort - perl pragma to control sort() behaviour The sort pragma is now a no-op, and its use is discouraged. ... The default sort has been stable since v5.8.0, and given this consistent behaviour for almost two decades, everyone has come to assume stability. Given the above performance difference between stable and non-stable sorts, this looks like an unfortunate decision to me. After all, there may be times when you really want the performance boost of a non-stable sort in Perl. Update: Stroustrup, in his C++ Programming Language book, notes that `stable_sort` improves towards `Nlog(N)` when the system has sufficient memory, but degrades to `Nlog(N)log(N)` otherwise ... so I guess the use case for needing a non-stable sort in Perl is when you are sorting huge arrays. See also Sorting algorithm (wikipedia). llil2grt.cpp* Here is my C++ solution inspired by this llil2grt.pl Perl GRT solution. Desperate to avoid the `sort` function after being burnt during my `llil2m.cpp` adventure, and remembering jwkrahn's cute GRT trick of achieving a descending sort via an ascending sort of negative numbers, I had to try a similar stunt in C++. In `llil2grt` below, I avoided `sort` simply by creating an inverted `set_int_str_type` container from the existing `map_str_int_type` container. // llil2grt.cpp. // Inspired by llilgrt.pl: improve sort performance via a negative cou +nt. // g++ compile on Linux: // g++ -o llil2grt -std=c++11 -Wall -O3 llil2grt.cpp // This g++ command also works with mingw C++ compiler (https://source +forge.net/projects/mingw-w64) // that comes bundled with Strawberry Perl (C:\Strawberry\c\bin\g++.ex +e). // Example run: llil2grt tt1.txt tt2.txt tt3.txt >out.txt #include <cstddef> #include <cstdint> #include <cstdlib> #include <cstring> #include <ctime> #include <cstdio> #include <string> #include <vector> #include <set> #include <map> #include <algorithm> #include <utility> #include <iostream> #include <fstream> #include <sstream> static_assert(sizeof(size_t) == sizeof(int64_t), "size_t too small, ne +ed a 64-bit compile"); // ------------------------------------------------------------------- +--------- // Crude hack to see Windows Private Bytes in Task Manager by sleeping + at // program end (see also sleep hack at end of main) // #include <chrono> // #include <thread> // ------------------------------------------------------------------- +--------- typedef long long llil_int_type; using str_int_type = std::pair<std::string, llil_int_type>; using int_str_type = std::pair<llil_int_type, std::string>; using map_str_int_type = std::map<std::string, llil_int_type>; using vec_str_int_type = std::vector<str_int_type>; using set_int_str_type = std::set<int_str_type>; // Mimic the Perl get_properties subroutine -------------------------- +-- // Limit line length and use ANSI C functions to try to boost performa +nce #define MAX_LINE_LEN_L 255 static void get_properties( int nfiles, // in: the number of input files char* fname[], // in: the input file names map_str_int_type& hash_ret) // out: a hash of properties { FILE* fh; char line[MAX_LINE_LEN_L+1]; char* word; llil_int_type count; for (int i = 0; i < nfiles; ++i) { fh = ::fopen(fname[i], "r"); if (fh == NULL) { std::cerr << "Error opening '" << fname[i] << "' : errno=" << + errno << "\n"; continue; } while ( ::fgets(line, MAX_LINE_LEN_L, fh) != NULL ) { word = ::strtok(line, "\t"); count = ::atoll( ::strtok(NULL, "\n") ); hash_ret[word] -= count; } ::fclose(fh); } } // ------------------------------------------------------------------- +-- int main(int argc, char* argv[]) { if (argc < 2) { std::cerr << "usage: llil2grt file1 file2 ... >out.txt\n"; return 1; } std::cerr << "llil2grt start\n"; time_t tstart1 = ::time(NULL); clock_t cstart1 = ::clock(); // Create the hash of properties map_str_int_type hash_ret; get_properties(argc - 1, &argv[1], hash_ret); clock_t cend1 = ::clock(); double ctaken1 = (double) (cend1 - cstart1) / (double)CLOCKS_PER_SE +C; std::cerr << "get_properties CPU time : " << ctaken1 << " secs +\n"; clock_t cstart2 = ::clock(); // To avoid calling sort(), try creating an inverted std::set conta +iner // Note: negative count gives desired ordering (just like Perl GRT +sort :) set_int_str_type s; for ( auto const& kv : hash_ret ) { // Note: emplace_hint() was a lot faster than emplace() s.emplace_hint( s.end(), kv.second, kv.first ); } clock_t cend2s = ::clock(); // Output the (already sorted) std::set - no sort() function requir +ed! // Note: fix up negative count via -n.first for ( auto const& n : s ) { std::cout << n.second << '\t' << -n.fir +st << '\n'; } clock_t cend2 = ::clock(); time_t tend2 = ::time(NULL); long ttaken = static_cast<long>(::difftime(tend2, tstart1) + 0 +.5); double ctaken = (double) (cend2 - cstart1) / (double)CLOCKS_PER_ +SEC; double ctaken2s = (double) (cend2s - cstart2) / (double)CLOCKS_PER +_SEC; double ctaken2o = (double) (cend2 - cend2s) / (double)CLOCKS_PER +_SEC; std::cerr << "emplace set sort CPU time : " << ctaken2s << " sec +s\n"; std::cerr << "write stdout CPU time : " << ctaken2o << " sec +s\n"; std::cerr << "total CPU time : " << ctaken << " sec +s\n"; std::cerr << "total wall clock time : " << ttaken << " sec +s\n"; // Hack to see Private Bytes in Windows Task Manager (uncomment nex +t line so process doesn't exit too quickly) // std::this_thread::sleep_for(std::chrono::milliseconds(90000000 +)); return 0; } [download] I was pleasantly surprised to see this improved my fastest C++ solution from 9 seconds down to 7: `> llil2grt big1.txt big2.txt big3.txt >llil2grt.tmp llil2grt start get_properties CPU time : 4.252 secs emplace set sort CPU time : 1.282 secs write stdout CPU time : 1.716 secs total CPU time : 7.254 secs total wall clock time : 7 secs Memory use (Windows Private Bytes): 1,626,728K` [download] Though faster, it's no surprise memory use increased because instead of sorting in place, you're creating a new `set_int_str_type` container. Updated: Minor cosmetic changes were made to llil2grt.cpp after posting. Added "Sorting Algorithms" section with a bit more information on stable sorts.	[reply] [d/l] [select]
Re: Rosetta Code: Long List is Long :awk(1)+sort(1) by parv (Parson) on Dec 12, 2022 at 07:59 UTC
In awk(1) + sort(1) (environment) ... #!/bin/sh # Source: https://perlmonks.org/index.pl?node_id=11148773 # # This is one shell implementation based on the problem specification +... # # Rosetta Code: Long List is Long, 20221130, # by eyepopslikeamosquito # https://perlmonks.org/?node_id=11148465 case $# in 0 ) printf "Give a list of files to sort.\n" >&2 exit 1 ;; esac start=$( date '+%s' ) # Takes ~135 s. awk ' \ { cat_count[ $1 ] += $2 } \ END \ { for ( cat in cat_count ) \ { printf "%s\t%s\n", cat, cat_count[ cat ] } \ } \ ' $@ \ \| sort -k2,2rn -k1,1 end=$( date '+%s' ) printf "total time: %d s\n" $(( end - start )) >&2 [download]	[reply] [d/l]
Re^2: Rosetta Code: Long List is Long :awk(1)+sort(1) by marioroy (Prior) on Dec 12, 2022 at 13:47 UTC
I tried LANG=C and sorting individually (two sorts). Results from a Linux box: `54 seconds LANG=en_US.UTF-8 33 seconds LANG=C sort -k2,2rn -k1,1 23 seconds LANG=C sort -k1,1 \| sort -k2,2rn` [download] Testing: `#!/bin/sh # https://www.perlmonks.org/?node_id=11148773 if [ $# -eq 0 ]; then printf "Give a list of files to sort.\n" >&2 exit 1 fi LANG=C awk ' { cat_count[ $1 ] += $2 } END { for ( cat in cat_count ) printf "%s\t%s\n", cat, cat_count[ cat ] } ' $@ \ \| sort -k1,1 \| sort -k2,2rn printf "total time: %d s\n" $SECONDS >&2` [download]	[reply] [d/l] [select]
Re^3: Rosetta Code: Long List is Long :awk(1)+sort(1) by parv (Parson) on Dec 12, 2022 at 18:43 UTC
Good point about using `LANG=C` (to make for a fairer comparison for I had set `ascii` encoding to parse the input (but not during sorting🤔) in my Python version). With that change, takes ~99 s; that and 2 sorts, takes ~60 s. Read more... awk+sort, now with double the sorts (1168 Bytes)	[reply] [d/l] [select]
Re: Rosetta Code: Long List is Long by Anonymous Monk on Dec 02, 2022 at 00:22 UTC
Maybe it is not worth anything, but just exploiting a loophole of ridiculous 6 bytes `a..z` fixed length keys, and count fitting single byte. `$r` is just to quickly convert decimal to/from base-27, nothing more. Sure it all could be optimized. Excluding 0 from base-27 representation (i.e. using 27 instead of 26) is to avoid padding with leading zeroes. It follows that `$buf_in` length is wasteful 387,420,489 instead of 308,915,776, but, what the heck, I can be royally wasteful with this implementation. Decrementing count down from 255 instead of just incrementing from 0 is further silly optimization which I see this late time of day, but sure oversee better improvements. + Of course `$MAX` is `276` (or would be `266`). Can't confirm what eyepopslikeamosquito says about memory, with original `llil.pl` I see Working Set goes to approx. 2.9 GB. Mine doesn't exceed ~530 Mb. `llil start get_properties : 15 secs sort + output : 103 secs total : 118 secs my_test start get_properties : 31 secs sort + output: 10 secs total: 41 secs` [download] ... use strict; use warnings; use feature 'say'; use Math::GMPz ':mpz'; use Sort::Packed 'sort_packed'; @ARGV or die "usage: $0 file...\n"; my @llil_files = @ARGV; warn "my_test start\n"; my $tstart1 = time; my $r = Rmpz_init; Rmpz_set_str( $r, 'zzzzzz' =~ tr/a-z1-9/1-9a-z/r, 27 ); my $MAX = Rmpz_get_ui( $r ); my ( $buf_in, $buf_out ) = ( "\xFF" x $MAX, '' ); for my $fname ( @llil_files ) { open( my $fh, '<', $fname ) or die "error: open '$fname': $!"; while ( <$fh> ) { chomp; my ( $word, $count ) = split /\t/; $word =~ tr/a-z1-9/1-9a-z/; Rmpz_set_str( $r, $word, 27 ); vec( $buf_in, Rmpz_get_ui( $r ), 8 ) -= $count; } close( $fh ) or die "error: close '$fname': $!"; } while ( $buf_in =~ /[^\xFF]/g ) { Rmpz_set_ui( $r, @- ); $buf_out .= pack 'aa6', $&, Rmpz_get_str( $r, 27 ) =~ tr/1-9a-z/a- +z/r } my $tend1 = time; warn "get_properties : ", $tend1 - $tstart1, " secs\n"; my $tstart2 = time; sort_packed C7 => $buf_out; while ( $buf_out ) { my ( $count, $word ) = unpack 'Ca6', substr $buf_out, 0, 7, ''; printf "%s\t%d\n", $word, 255 - $count } my $tend2 = time; warn "sort + output: ", $tend2 - $tstart2, " secs\n"; warn "total: ", $tend2 - $tstart1, " secs\n"; [download] What follows is fiction, not implemented in code, can be ignored. I said 'ridiculous' above, but in fact I do remember original LLIL thread, not sure now but then I thought keys were expected significantly longer than `qw/foo bar aaaaaa/`, etc. (genetic sequences?). So then this would be multi-GB total of input files which are mostly keys, just keeping them keys in RAM is out of the question. Not to mention building and keeping hashes and working with them. I thought about HQ hashing (`xxHash`?) of keys and sparsely storing (`Judy`?) values indexed by produced integer, where values are e.g. 64-bit-packed integer comprised of file id offset of start of line containing unique key first seen (`tell`) count (updated as files are read in) After all files are consumed, value positions within array (i.e. indexes) are no longer important. IF densely-packed array data (i.e. discard zero values) fits RAM, then problem is solved. Sort packed data, and produce output, which, yes, would require randomly reading lines (i.e. real keys) from input files AGAIN based on stored file id and line position.	[reply] [d/l] [select]
Re^2: Rosetta Code: Long List is Long by Anonymous Monk on Dec 08, 2022 at 13:59 UTC
For reference, results of `llil2d.pl` (11148585) for my hardware are: `llil2d start get_properties : 10 secs sort + output : 21 secs total : 31 secs 2317524 Kbytes of RAM were used` [download] (a report about resident RAM size was produced with same few lines of code as in script below) In fact, "words" are so short (and few) that I realized (later, after I wrote 11148660 with results there) that I don't need any indirection and can simply use Judy::HS: "words" are kept in RAM all the time, both in my in-memory flat "DB" and, opaquely, somewhere in Judy code. And then there's solution in Perl, which is both faster and requires much less memory, and generates identical output: `my_test start get_properties : 13 secs sort + output : 7 secs total : 20 secs 349124 Kbytes of RAM were used` [download] Short integer to keep count is not required with this example, it only demonstrates count can be any length and not limited to a byte as in my previous "solution" in this thread. Same about 10 bytes to pad "words" to: it can be anything to fit longest word; words can be different in length. use strict; use warnings; use Judy::HS qw/ Set Get Free /; use Sort::Packed 'sort_packed'; my $DATA_TEMPLATE = 'nZ10'; my $DATA_SIZE = 12; my $COUNT_SIZE_BYTES = 2; my $COUNT_SIZE_BITS = 16; my $COUNT_MAX = int( 2 ** $COUNT_SIZE_BITS - 1 ); @ARGV or die "usage: $0 file...\n"; my @llil_files = @ARGV; warn "my_test start\n"; my $tstart1 = time; my ( $data, $current ) = ( '', 0 ); my $judy; for my $fname ( @llil_files ) { open( my $fh, '<', $fname ) or die $!; while ( <$fh> ) { chomp; my ( $word, $count ) = split /\t/; ( undef, my $val ) = Get( $judy, $word ); if ( defined $val ) { vec( $data, $val * $DATA_SIZE / $COUNT_SIZE_BYTES, $COUNT_SIZE_BITS ) -= $count } else { $data .= pack $DATA_TEMPLATE, $COUNT_MAX - $count, $word; Set( $judy, $word, $current ); $current ++ } } } Free( $judy ); my $tend1 = time; warn "get_properties : ", $tend1 - $tstart1, " secs\n"; my $tstart2 = time; sort_packed "C$DATA_SIZE", $data; while ( $data ) { my ( $count, $word ) = unpack $DATA_TEMPLATE, substr $data, 0, $DA +TA_SIZE, ''; printf "%s\t%d\n", $word, $COUNT_MAX - $count } my $tend2 = time; warn "sort + output : ", $tend2 - $tstart2, " secs\n"; warn "total : ", $tend2 - $tstart1, " secs\n"; use Memory::Usage; my $m = Memory::Usage-> new; $m-> record; warn $m-> state-> [0][3], " Kbytes of RAM were used\n"; [download] What if "words" are significantly longer? With approx. 10e6 unique words in this test, if they were each hundreds of bytes, then several GB of RAM would be used just to keep them. Perhaps impractical. So I'm returning to my idea of keeping only hashes and offsets within files. Results I posted in 11148660 are of course valid, I was using 64-bit hashes as `Judy::L` integer keys. In fact 32-bit hashes generated with `xxHash` have a few collisions for 10e6 6-character words. 64-bit hashes have no collisions. I'm not qualified to predict how safe it is to expect no collisions for set of which size. Maybe 128-bit hashes, below, are overkill. Just for entertainment I decided to write "indirect" solution with 128-bit hashes. Therefore I'm not using Judy::L, but same Judy::HS with keys being 32-char hex-encoded hashes. Otherwise, almost same plan and code. Data template layout chosen arbitrarily and can be adjusted. Of course, words are not alphabetically sorted in output, but OTOH it was not original LLIL requirement. Results: `my_test start get_properties : 21 secs sort + output : 23 secs total : 44 secs 841880 Kbytes of RAM were used` [download] I think same amount of RAM would be used if words were not 6 but 600 or 6000 bytes. Relatively fast 2nd phase (considering huge amount of random reads) is due to NMVe storage here. (Off topic: I managed to install Crypt::xxHash under Windows/Strawberry, but Judy was too much challenge for me. I wrote solution very close to code below, using not Judy, but very thin Inline::C wrapper for AVL library (google for jsw_avltree). It uses approx. same RAM and ~2x time for 1st phase. But also ~2.5x time for 2nd phase, which is exactly same code with same hardware (dual boot). Don't know if Windows I/O is just so much slower) use strict; use warnings; use Judy::HS qw/ Set Get Free /; use Crypt::xxHash 'xxhash3_128bits_hex'; use Sort::Packed 'sort_packed'; my $DATA_TEMPLATE = 'nnNn'; # word count # file index # word position # word length my $DATA_SIZE = 10; my $COUNT_SIZE_BYTES = 2; my $COUNT_SIZE_BITS = 16; my $COUNT_MAX = int( 2 ** $COUNT_SIZE_BITS - 1 ); @ARGV or die "usage: $0 file...\n"; my @llil_files = @ARGV; warn "my_test start\n"; my $tstart1 = time; my ( $data, $current ) = ( '', 0 ); my $judy; for my $idx ( 0 .. $#llil_files ) { open( my $fh, '<', $llil_files[ $idx ]) or die $!; until ( eof $fh ) { my $pos = tell $fh; $_ = <$fh>; chomp; my ( $word, $count ) = split /\t/; my $xx = xxhash3_128bits_hex( $word, 0 ); ( undef, my $val ) = Get( $judy, $xx ); if ( defined $val ) { vec( $data, $val * $DATA_SIZE / $COUNT_SIZE_BYTES, $COUNT_SIZE_BITS ) -= $count } else { $data .= pack $DATA_TEMPLATE, $COUNT_MAX - $count, $idx, $pos, length $word; Set( $judy, $xx, $current ); $current ++ } } } Free( $judy ); my $tend1 = time; warn "get_properties : ", $tend1 - $tstart1, " secs\n"; my $tstart2 = time; sort_packed "C$DATA_SIZE", $data; my @fh; open $fh[ $_ ], '<', $llil_files[ $_ ] for 0 .. $#llil_files; while ( $data ) { my ( $count, $idx, $pos, $len ) = unpack $DATA_TEMPLATE, substr $data, 0, $DATA_SIZE, ''; sysseek $fh[ $idx ], $pos, 0; sysread $fh[ $idx ], my( $word ), $len; printf "%s\t%d\n", $word, $COUNT_MAX - $count } my $tend2 = time; warn "sort + output : ", $tend2 - $tstart2, " secs\n"; warn "total : ", $tend2 - $tstart1, " secs\n"; use Memory::Usage; my $m = Memory::Usage-> new; $m-> record; warn $m-> state-> [0][3], " Kbytes of RAM were used\n"; [download]	[reply] [d/l] [select]
Re^3: Rosetta Code: Long List is Long by marioroy (Prior) on Dec 09, 2022 at 06:58 UTC
Thank you, Anonymous Monk! I tried-and-enjoyed your solution to include doubling the data size e.g. 32-bit count and 20 key length. `$ diff llil_judyhs1.pl llil_judyhs2.pl 7,10c7,10 < my $DATA_TEMPLATE = 'nZ10'; < my $DATA_SIZE = 12; < my $COUNT_SIZE_BYTES = 2; < my $COUNT_SIZE_BITS = 16; --- > my $DATA_TEMPLATE = 'NZ20'; > my $DATA_SIZE = 24; > my $COUNT_SIZE_BYTES = 4; > my $COUNT_SIZE_BITS = 32;` [download] `$ time perl llil_judyhs1.pl big1.txt big2.txt big3.txt >out1.txt my_test start get_properties : 10 secs sort + output : 5 secs total : 15 secs 353468 Kbytes of RAM were used real 0m14.770s user 0m14.669s sys 0m0.084s $ time perl llil_judyhs2.pl big1.txt big2.txt big3.txt >out2.txt my_test start get_properties : 10 secs sort + output : 5 secs total : 15 secs 473784 Kbytes of RAM were used real 0m15.073s user 0m14.938s sys 0m0.119s` [download]	[reply] [d/l] [select]
Re^4: Rosetta Code: Long List is Long -- Parallel by marioroy (Prior) on Dec 09, 2022 at 08:09 UTC
Re^3: Rosetta Code: Long List is Long (Test File Generators) by eyepopslikeamosquito (Archbishop) on Dec 09, 2022 at 09:09 UTC
Thanks anonymonk. Excellent work! Though I've never used them, I've heard good things about Judy Arrays and maintain a list of references on them at PM. Might get around to actually using them one day. :) What if "words" are significantly longer? With approx. 10e6 unique words in this test, if they were each hundreds of bytes, then several GB of RAM would be used just to keep them. Perhaps impractical. Good question! Apologies, my initial test file generator was very primitive. To try to help answer your question I've quickly whipped up a test file generator that generates much longer keys (up to around 200 characters in length) and longer counts too. I was conservative with the counts because I didn't want to disqualify folks using 32-bit ints. # gen-long-llil.pl # Crude program to generate a LLiL test file with long names and count +s # perl gen-long-llil.pl long1.txt 600 use strict; use warnings; use autodie; { my $ordmin = ord('a'); my $ordmax = ord('z') + 1; # Generate a random word sub gen_random_word { my $word = shift; # word prefix my $nchar = shift; # the number of random chars to append for my $i (1 .. $nchar) { $word .= chr( $ordmin + int( rand($ordmax - $ordmin) ) ); } return $word; } } my $longworda = join '', 'a' .. 'z'; my $longwordz = join '', reverse('a' .. 'z'); my $longcount = 1_000_000; sub create_long_test_file { my $fname = shift; my $howmany = shift; open( my $fh_out, '>', $fname ); # Some with no randomness for my $h ( 1 .. $howmany ) { for my $i ( 1 .. 8 ) { my $cnt = $longcount + $i - 1; my $worda = $longworda x $i; my $wordz = $longwordz x $i; print {$fh_out} "$worda\t$cnt\n$wordz\t$cnt\n"; } } # Some with randomness my $wordlen = 1; for my $h ( 1 .. $howmany ) { for my $i ( 1 .. 8 ) { my $cnt = $longcount + $i - 1; my $worda = $longworda x $i; my $wordz = $longwordz x $i; for my $c ( 'a' .. 'z' ) { for my $z ( 1 .. 2 ) { print {$fh_out} $worda . gen_random_word( $c, $wordlen +) . "\t" . (1000000 + $z) . "\n"; print {$fh_out} $wordz . gen_random_word( $c, $wordlen +) . "\t" . (1000000 + $z) . "\n"; } } } } } my $outfile = shift; my $count = shift; $outfile or die "usage: $0 outfile count\n"; $count or die "usage: $0 outfile count\n"; $count =~ /^\d+$/ or die "error: count '$count' is not a number\n"; print "generating short long test file '$outfile' with count '$count'\ +n"; create_long_test_file( $outfile, $count ); print "file size=", -s $outfile, "\n"; [download] I ran it like this: `> perl gen-long-llil.pl long1.txt 600 generating short long test file 'long1.txt' with count '600' file size=65616000 > perl gen-long-llil.pl long2.txt 600 generating short long test file 'long2.txt' with count '600' file size=65616000 > perl gen-long-llil.pl long3.txt 600 generating short long test file 'long3.txt' with count '600' file size=65616000` [download] Then reran my two biggish benchmarks with a mixture of files: `> perl llil2d.pl big1.txt big2.txt big3.txt long1.txt long2.txt long3. +txt >perl2.tmp llil2d start get_properties : 11 secs sort + output : 23 secs total : 34 secs > llil2a big1.txt big2.txt big3.txt long1.txt long2.txt long3.txt >cpp +2.tmp llil2 start get_properties : 6 secs sort + output : 5 secs total : 11 secs > diff cpp2.tmp perl2.tmp` [download] Improved test file generators welcome. Updated Test File Generators These were updated to allow a "\n" (rather than "\r\n") on Windows after this was pointed out here. Curiously, `\n` seems to be slower than `\r\n` on Windows if you don't set `binmode`! I am guessing that chomp is slower with `\n` than with `\r\n` on a Windows text stream. gen-llil.pl # gen-llil.pl # Crude program to generate a big LLiL test file to use in benchmarks # On Windows running: # perl gen-llil.pl big2.txt 200 3 - produces a test file with size + = 35,152,000 bytes # (lines terminated with "\r\n") # perl gen-llil.pl big2.txt 200 3 1 - produces a test file with size + = 31,636,800 bytes # (lines terminated with "\n") # On Unix, lines are terminated with "\n" and the file size is always +31,636,800 bytes use strict; use warnings; use autodie; { my $ordmin = ord('a'); my $ordmax = ord('z') + 1; # Generate a random word sub gen_random_word { my $word = shift; # word prefix my $nchar = shift; # the number of random chars to append for my $i (1 .. $nchar) { $word .= chr( $ordmin + int( rand($ordmax - $ordmin) ) ); } return $word; } } sub create_test_file { my $fname = shift; my $count = shift; my $wordlen = shift; my $fbin = shift; open( my $fh_out, '>', $fname ); $fbin and binmode($fh_out); for my $c ( 'aaa' .. 'zzz' ) { for my $i (1 .. $count) { print {$fh_out} gen_random_word( $c, $wordlen ) . "\t" . 1 . +"\n"; } } } my $outfile = shift; my $count = shift; my $wordlen = shift; my $fbin = shift; # default is to use text stream (not a binary +stream) defined($fbin) or $fbin = 0; $outfile or die "usage: $0 outfile count wordlen\n"; $count or die "usage: $0 outfile count wordlen\n"; print "generating test file '$outfile' with count '$count' (binmode=$f +bin)\n"; create_test_file($outfile, $count, $wordlen, $fbin); print "file size=", -s $outfile, "\n"; [download] gen-long-llil.pl # gen-long-llil.pl # Crude program to generate a LLiL test file with long names and count +s # perl gen-long-llil.pl long1.txt 600 # On Windows running: # perl gen-long-llil.pl long1.txt 600 - produces a test file with s +ize = 65,616,000 bytes # (lines terminated with "\r\n +") # perl gen-long-llil.pl long1.txt 600 - produces a test file with s +ize = 65,107,200 bytes # (lines terminated with "\n") # On Unix, lines are terminated with "\n" and the file size is always +65,107,200 bytes use strict; use warnings; use autodie; { my $ordmin = ord('a'); my $ordmax = ord('z') + 1; # Generate a random word sub gen_random_word { my $word = shift; # word prefix my $nchar = shift; # the number of random chars to append for my $i (1 .. $nchar) { $word .= chr( $ordmin + int( rand($ordmax - $ordmin) ) ); } return $word; } } my $longworda = join '', 'a' .. 'z'; my $longwordz = join '', reverse('a' .. 'z'); my $longcount = 1_000_000; sub create_long_test_file { my $fname = shift; my $howmany = shift; my $fbin = shift; open( my $fh_out, '>', $fname ); $fbin and binmode($fh_out); # Some with no randomness for my $h ( 1 .. $howmany ) { for my $i ( 1 .. 8 ) { my $cnt = $longcount + $i - 1; my $worda = $longworda x $i; my $wordz = $longwordz x $i; print {$fh_out} "$worda\t$cnt\n$wordz\t$cnt\n"; } } # Some with randomness my $wordlen = 1; for my $h ( 1 .. $howmany ) { for my $i ( 1 .. 8 ) { my $cnt = $longcount + $i - 1; my $worda = $longworda x $i; my $wordz = $longwordz x $i; for my $c ( 'a' .. 'z' ) { for my $z ( 1 .. 2 ) { print {$fh_out} $worda . gen_random_word( $c, $wordlen +) . "\t" . (1000000 + $z) . "\n"; print {$fh_out} $wordz . gen_random_word( $c, $wordlen +) . "\t" . (1000000 + $z) . "\n"; } } } } } my $outfile = shift; my $count = shift; my $fbin = shift; # default is to use text stream (not a binary +stream) defined($fbin) or $fbin = 0; $outfile or die "usage: $0 outfile count\n"; $count or die "usage: $0 outfile count\n"; $count =~ /^\d+$/ or die "error: count '$count' is not a number\n"; print "generating short long test file '$outfile' with count '$count' +(binmode=$fbin)\n"; create_long_test_file( $outfile, $count, $fbin ); print "file size=", -s $outfile, "\n"; [download] Updated this node with new test file generators so you can generate test files that are the same size on Unix and Windows. That is, by setting `$fbin` you can make the line ending "\n" on Windows, instead of "\r\n". See Re^2: Rosetta Code: Long List is Long (faster) for more background.	[reply] [d/l] [select]
Re^4: Rosetta Code: Long List is Long by Anonymous Monk on Dec 09, 2022 at 10:14 UTC
Re: Rosetta Code: Long List is Long (faster) by Anonymous Monk on Dec 27, 2022 at 14:47 UTC
Please feel free to respond away with solutions... Short version: J script runs in ~5.7 sec using ~650 MB to produce exactly same output. Which makes it fastest of all solutions so far. (The caveat: "words" are supposed to be not too different in length, they are temporarily padded to longest during program run (i.e. not padded at all with current test sample), expect deterioration if, say, their lengths differ 2 or more orders of magnitude or so, especially if just a few are very long. I didn't check.) ###################### (Prose and previous results, can be skipped) I was shocked to notice this thread is almost a month old already. While I'm in no hurry and have been pursuing what follows at leisure and only rarely (kind of "for dessert"), it's better to publish this at long last, be it "final" optimized version or not (I'm sure it can be improved a lot), before the thread is dead cold and whoever participated have to make effort to read their own code because of time elapsed. As a reference frame, here are assortment of results of previous solutions, with my hardware: llil2grt.pl (see 11148713) (fastest "pure core-only Perl"), Windows: `llil2grt start get_properties : 16 secs sort + output : 31 secs total : 47 secs Peak working set (memory): 2,744,200K` [download] Same code & data & PC, Linux: (I didn't investigate why such difference.) `llil2grt start get_properties : 11 secs sort + output : 20 secs total : 31 secs 2,152,848 Kbytes of RAM were used` [download] I assume that Judy (see 11148585) is best in both speed and memory for non-parallel Perl solutions, with same caveat as at the very top: "words" are temporarily padded to fixed length e.g. here to 10 bytes.: `my_Judy_test start get_properties : 13 secs sort + output : 7 secs total : 20 secs 349,124 Kbytes of RAM were used` [download] Being lazy bum, I didn't compile C++ solutions (nor do I code in C++), here is a copy-paste from 11148969, I assume it is the best result so far, among C++ and all others: (For my PC, I expect time to be worse.) `llil2grt start get_properties CPU time : 4.252 secs emplace set sort CPU time : 1.282 secs write stdout CPU time : 1.716 secs total CPU time : 7.254 secs total wall clock time : 7 secs Memory use (Windows Private Bytes): 1,626,728K` [download] ###################### (End of prose and previous results) Code below generates next message with RAM usage taken from Windows Task Manager (to be on par with how it was measured for Perl), while script pauses for a key (`Ctrl-D` or `Ctrl-Z + Enter` combo as usual for Linux or Windows, respectively) after finish: `Read and parse input: 1.636 Classify, sum, sort: 2.206 Format and write output: 1.895 Total time: 5.737 Finished. Waiting for a key... Peak working set (memory): 657,792K` [download] The injection of `CR` into output lines is only required on Windows (actually, not required at all) to later ensure no difference with output from Perl. The "magic constant" 3 for number width can be any, and is only used for intermediate step. I had to make this code slightly less readable than it was during development, by somewhat aggressively re-using over and over again same variable names for `words` and `nums`, as data are processed and modified as script progresses. They were different "self-explanatory" names at each stage, but because arrays are huge, it's better to immediately over-write variable on successive assignments to conserve memory. "Erasing" throw-away helper arrays (similar to `undef` in Perl) serves same purpose. Actually, during development I was playing with this toy dataset, here's original data and result: text =: noun define tango 1 charlie 2 bravo 1 foxtrot 4 alpha 3 foxtrot 1 bravo 1 foxtrot 7 ) NB. Do work here... ] text foxtrot 12 alpha 3 bravo 2 charlie 2 tango 1 The script: NB. ----------------------------------------------------------- NB. --- This file is "llil.ijs" NB. --- Run as e.g.: NB. NB. jconsole.exe llil.ijs big1.txt big2.txt big3.txt out.txt NB. NB. --- (NOTE: last arg is output filename, file is overwritten) NB. ----------------------------------------------------------- args =: 2 }. ARGV fn_out =: {: args fn_in =: }: args NUM_LENGTH =: 3 PAD_CHAR =: ' ' make_sel =: [: (1 2 0& \|:) @ ,: ([ ,. ] - [: , [) sort_some =: ([: /:~ {)`[`] } text =: , freads " 0 fn_in lf_pos =: I. text = LF tab_pos =: I. text = TAB words =: ((0 , >: }: lf_pos) make_sel tab_pos) ];.0 text nums =: 0&". (tab_pos make_sel lf_pos) ; @: (<;.0) text erase 'text' ; 'lf_pos' ; 'tab_pos' t1 =: (6!:1) '' NB. time since engine start nums =: words +//. nums words =: ~. words 'words nums' =: (\: nums)& { &.:>"_1 words ; nums starts =: I. ~: nums ranges =: starts ,. (}. starts , # nums) - starts count =: # starts sort_words =: monad define 'ranges words' =. y range =. ({. + i. @ {:) {. ranges (}. ranges) ; range sort_some words ) words =: > {: sort_words ^: count ranges ; words erase 'starts' ; 'ranges' t2 =: (6!:1) '' NB. time since engine start nums =: (- NUM_LENGTH) ]\ NUM_LENGTH ": nums text =: , words ,. TAB ,. (nums ,. CR) ,. LF erase 'words' ; 'nums' text =: (#~ ~: & PAD_CHAR) text text fwrite fn_out erase < 'text' t3 =: (6!:1) '' NB. time since engine start echo 'Read and parse input: ' , ": t1 echo 'Classify, sum, sort: ' , ": t2 - t1 echo 'Format and write output: ' , ": t3 - t2 echo 'Total time: ' , ": t3 echo '' echo 'Finished. Waiting for a key...' stdin '' exit 0 [download]	[reply] [d/l] [select]
Re^2: Rosetta Code: Long List is Long (faster) by eyepopslikeamosquito (Archbishop) on Dec 27, 2022 at 21:38 UTC
I was shocked to notice this thread is almost a month old already ... it's better to publish this at long last, be it "final" optimized version or not (I'm sure it can be improved a lot), before the thread is dead cold and whoever participated have to make effort to read their own code because of time elapsed Thanks for posting. I suggest you just take your time and enjoy it without worrying too much about how old the thread is. This is actually one of my favourite features of Perl Monks, so much so that I wrote a meditation about it: Necroposting Considered Beneficial. :) Your reply motivated me into finally getting around to installing Linux on my newish Windows laptop. Being lazy, I did this with a single `wsl --install` command to install the default Ubuntu distribution of Linux from the Microsoft Store. AFAIK, the main alternative is to install VMware, followed by multiple different Linux distros. Anyways, after doing that I saw similar performance of both my fastest Perl and C++ versions. For llil2grt.cpp on Windows 11: `llil2grt start get_properties CPU time : 4.252 secs emplace set sort CPU time : 1.282 secs write stdout CPU time : 1.716 secs total CPU time : 7.254 secs total wall clock time : 7 secs` [download] On Ubuntu WSL2 Linux (5.15.79.1-microsoft-standard-WSL2), running on the same hardware, compiled with the identical `g++ -o llil2grt -std=c++11 -Wall -O3 llil2grt.cpp`, we see it runs slightly faster: `llil2grt start get_properties CPU time : 3.63153 secs emplace set sort CPU time : 1.09085 secs write stdout CPU time : 1.41164 secs total CPU time : 6.13412 secs total wall clock time : 6 secs` [download] Sadly, it's becoming increasingly obvious that to make this run significantly faster, I'll probably have to change the simple and elegant: `hash_ret[word] -= count;` [download] into something much uglier, possibly containing `"emplace_hint"` or other std::map claptrap ... and I just can't bring myself to do that. :) More attractive is to leave the simple and elegant one-liner alone and instead try to inject a faster custom std::map memory allocator (I have no idea how to do that yet). Conversely, my fastest Perl solution llil2grt.pl runs slightly slower on Ubuntu: `llil2grt start get_properties : 10 secs sort + output : 22 secs total : 32 secs` [download] compared to 10, 20, 30 secs on Windows 11. Perl v5.34.0 on Linux vs Strawberry Perl v5.32.0 on Windows. Update: while this shortened version `llil2cmd-long.pl` runs a bit faster on Ubuntu (but not on Windows): `llil2cmd-long start get_properties : 7 secs sort + output : 21 secs total : 28 secs` [download] The injection of CR into output lines is only required on Windows (actually, not required at all) Yes, you're right, Windows nowadays seems perfectly happy with text files terminated with \n rather than the traditional DOS \r\n. By default, Perl and C++ both output text files with "\n" on Unix and "\r\n" on Windows. I'll update my test file generator to generate identical "\n" terminated files on both Unix and Windows. Update: Test file generators updated here: Re^3: Rosetta Code: Long List is Long (Test File Generators). Curiously, `\n` seems to be slower than `\r\n` on Windows if you don't set `binmode`! I am guessing that chomp is slower with `\n` than with `\r\n` on a Windows text stream. Update: Are you running Strawberry Perl on Windows? Which version? (Trying to understand why your Windows Perl seems slower than mine). Update: The processor and SSD disk (see Novabench top scoring disks) on my HP laptop: `Intel(R) Core(TM) i7-1065G7 CPU @ 1.30GHz, 1501 Mhz, 4 Core(s), 8 Logi +cal Processor(s) Disk (238 GB SSD): Intel Optane+238GBSSD (ave sequential read 513 MB/ +s; ave sequential write 280 MB/s) - score 73` [download] References Added Later Clear Linux OS (wikipedia) Advanced Vector Extensions (wikipedia) J Programming Language and System - AVX Guide Re^2: Rosetta Code: Long List is Long (faster) by marioroy (background info on the J Programming Language) Updated: Noted that `llil2grt.pl` on Ubuntu Linux runs slightly slower on Linux than Windows, along with detailed timings. Clarified that I'm running Windows 11. Added more detail on my laptop hardware. Mentioned `llil2cmd-long.pl`, developed later.	[reply] [d/l] [select]
Re^3: Rosetta Code: Long List is Long (faster) by Anonymous Monk on Dec 28, 2022 at 13:09 UTC
By "same PC" to run a test under Windows vs. Linux, I meant classic dual boot and GRUB as I have, but perhaps virtualization is no longer a hindrance, and, moreover, not in this case. Because I compiled `llil2grt.cpp` in both OSes (command line and options you advised), here's what I got: `$ ./llil2grt big1.txt big2.txt big3.txt >out.txt llil2grt start get_properties CPU time : 3.67015 secs emplace set sort CPU time : 1.19724 secs write stdout CPU time : 1.52074 secs total CPU time : 6.3882 secs total wall clock time : 6 secs >llil2grt.exe big1.txt big2.txt big3.txt >out.txt llil2grt start get_properties CPU time : 5.577 secs emplace set sort CPU time : 1.675 secs write stdout CPU time : 2.484 secs total CPU time : 9.736 secs total wall clock time : 10 secs` [download] Same relative difference I observe running Perl script. So looks like it's not an issue of Perl and Strawberry version you asked me about (which is latest available "5.32.1.1-64bit-PDL", BTW). I, further, compiled `llil2grt.cpp` using minGW shell and g++ which came with older 5.26 Strawberry, and got same 10 secs. I'm clueless why this PC is slow with Windows. Perhaps either MS or Dell issued a patch in recent years to address Kaby Lake CPU (mine is i5-7500T) "vulnerability"? I vaguely remember it was said performance would suffer if such patch to guard against mythical attackers would be applied. Just a guess, sorry if it's ridiculous. On the other hand, J script performs the same in both OSes. Speaking of which, to return to "interpreted language is faster than compiled C++" -- of course it's dataset bias to a large extent. I ran test with "long" files from another test file generator you suggested previously: `$ ./llil2grt long1.txt long2.txt long3.txt >out.txt llil2grt start get_properties CPU time : 0.559273 secs emplace set sort CPU time : 0.004393 secs write stdout CPU time : 0.003275 secs total CPU time : 0.567 secs total wall clock time : 1 secs $ jconsole llil.ijs long1.txt long2.txt long3.txt out_j.txt Read and parse input: 1.50791 Classify, sum, sort: 0.70953 Format and write output: 0.00430393 Total time: 2.22175 $ diff out.txt out_j.txt $` [download] (NUM_LENGTH changed to 10, otherwise same code)	[reply] [d/l] [select]
Re^4: Rosetta Code: Long List is Long (Virtualization, Container, Multi-boot and Linux References) by eyepopslikeamosquito (Archbishop) on Dec 28, 2022 at 21:54 UTC
Re^2: Rosetta Code: Long List is Long (faster) by marioroy (Prior) on Dec 29, 2022 at 08:51 UTC
I tried the J script demonstration, by Anonymous Monk, on a box running Clear Linux. The following are the steps taken. Installation: `cd ~/Downloads wget http://www.jsoftware.com/download/j903/install/j903_linux64.tar.g +z tar xzf j903_linux64.tar.gz -C ~/` [download] Script modification: Next, I made a change to not output CR so able to compare against original output without CR. For some reason, pressing a key or the return key does not exit the script (resolved by removing two lines). `$ diff llil.ijs llil2.ijs 50c50 < text =: , words ,. TAB ,. (nums ,. CR) ,. LF --- > text =: , words ,. TAB ,. nums ,. LF 64,65d63 < echo 'Finished. Waiting for a key...' < stdin ''` [download] Non-AVX results: `$ ~/j903/bin/jconsole llil2.ijs big1.txt big2.txt big3.txt out.txt Read and parse input: 0.850145 Classify, sum, sort: 2.08596 Format and write output: 1.24986 Total time: 4.18597` [download] AVX2 results: From the wiki, "The initial installation is non-AVX and finishing the steps upgrades to AVX or AVX2 as appropriate." `# Update J engine to AVX or AVX2, depending on the CPU. $ ~/j903/updateje.sh Engine: j903/j64avx2/linux Release-b: commercial/2022-01-28T04:13:29 Library: 9.03.08 Platform: Linux 64 Installer: J903 install InstallPath: /home/mario/j903 Contact: www.jsoftware.com $ ~/j903/bin/jconsole llil2.ijs big1.txt big2.txt big3.txt out.txt Read and parse input: 0.789199 Classify, sum, sort: 1.56741 Format and write output: 1.12442 Total time: 3.48103` [download] Anonymous Monk, blessings and grace. Thank you, for the J Programming Language demonstration.	[reply] [d/l] [select]
Re^2: Rosetta Code: Long List is Long (faster - vec) by eyepopslikeamosquito (Archbishop) on Jan 09, 2023 at 06:02 UTC
J script runs in ~5.7 sec using ~650 MB to produce exactly same output. Which makes it fastest of all solutions so far. The good news is that I've now found two different C++ versions that are faster. The bad news is that they're much uglier than my original llil2grt.cpp (which ran in about 6.2 seconds on Ubuntu). After getting nowhere trying to speed up `llil2grt.cpp`, I reluctantly decided that the simple and elegant `hash_ret[word] -= count` had to go. First the timings on Ubuntu. Because there's quite a bit of natural variation between runs, I did three runs of each: llil2vec start get_properties CPU time : 3.06313 secs emplace set sort CPU time : 0.923435 secs write stdout CPU time : 1.392 secs total CPU time : 5.37868 secs total wall clock time : 6 secs llil2vec start get_properties CPU time : 3.1567 secs emplace set sort CPU time : 0.970294 secs write stdout CPU time : 1.22305 secs total CPU time : 5.35015 secs total wall clock time : 5 secs llil2vec start get_properties CPU time : 3.32019 secs emplace set sort CPU time : 1.08277 secs write stdout CPU time : 1.22461 secs total CPU time : 5.62766 secs total wall clock time : 5 secs [download] `Ave CPU time: 5.5 secs (Memory use (Windows Private Bytes): 1,225,580K)` [download] llil2vec (fixed string length=6) start get_properties CPU time : 2.09353 secs emplace set sort CPU time : 0.795144 secs write stdout CPU time : 1.20994 secs total CPU time : 4.09871 secs total wall clock time : 4 secs llil2vec (fixed string length=6) start get_properties CPU time : 2.2078 secs emplace set sort CPU time : 0.707252 secs write stdout CPU time : 1.14867 secs total CPU time : 4.06383 secs total wall clock time : 4 secs llil2vec (fixed string length=6) start get_properties CPU time : 2.39225 secs emplace set sort CPU time : 1.0033 secs write stdout CPU time : 1.22765 secs total CPU time : 4.62331 secs total wall clock time : 4 secs [download] `Ave CPU time: 4.3 secs (Memory use (Windows Private Bytes): 814,940K)` [download] The first one still uses `std::string` so there are no limitations on word length. The second one is limited to a word length of six characters, and so works with `big1.txt`, while crashing spectacularly with `long1.txt`. Funny, I'd originally forgotten that way back in 2014, when desperately trying to make a search program run a hundred times faster, I'd stumbled upon this classic quote: That's not a minor disadvantage, we're talking about things being 50 to 100 times slower with a linked list. Compactness matters. Vectors are more compact than lists. And predictable usage patterns matter enormously. With a vector, you have to shove a lot of elements over, but caches are really really good at that. When you traverse lists you keep doing random access ... so you are actually random accessing your memory and you are maximizing your cache misses, which is exactly the opposite of what you want. -- Bjarne Stroustrup: Why you should avoid Linked Lists (3:30-5:00) (update: see also) Back then, because each memory access into my (huge) 4GB lookup tables was essentially random, most memory accesses missed the L1 cache, missed the L2 cache, missed the L3 cache, then waited for the cache line to be loaded into all three caches from main memory, while often incurring a TLB cache miss, just to rub salt into the wounds. Thankfully, there are no 4 GB lookup tables in this problem. But hashes (and linked lists) still tend to be mind-bogglingly slower than vectors on modern hardware, as indicated by Stroustrup's quote above. So I reluctantly decided that the simple and elegant `hash_ret[word] -= count` had to go. Though I really hate this new `std::vector` based solution, it does run faster than my earlier `std::map` based one. This is just a first attempt, further improvements are possible. The `llil2vec.cpp` source code is shown below. Though there are two sets of timings, there is only one program, compiled with or without `MAX_STR_LEN_L` defined. Suggestions for improvements welcome. llil2vec.cpp Read more... (9 kB)	[reply] [d/l] [select]
Re^3: Rosetta Code: Long List is Long (faster - vec) by marioroy (Prior) on Jan 09, 2023 at 10:31 UTC
For fixed string length (I ran several times), clang++ computes ~ 0.3 seconds faster compared to g++. See also, the J script result with AVX2 enabled (total time 3.48103 secs). # gcc 12.2.1 $ g++ -o llil2vec -std=c++11 -Wall -O3 -march=native -mtune=skylake -f +align-functions=32 -fno-semantic-interposition -mno-vzeroupper -mpref +er-vector-width=256 llil2vec.cpp $ ./llil2vec big1.txt big2.txt big3.txt >out.txt llil2vec (fixed string length=6) start get_properties CPU time : 1.89609 secs emplace set sort CPU time : 0.544972 secs write stdout CPU time : 0.842451 secs total CPU time : 3.28355 secs total wall clock time : 4 secs # clang version 15.0.4 $ clang++ -o llil2vec -std=c++11 -Wall -O3 -march=native -mtune=skylak +e -falign-functions=32 -fno-semantic-interposition -mno-vzeroupper -m +prefer-vector-width=256 llil2vec.cpp $ ./llil2vec big1.txt big2.txt big3.txt >out.txt llil2vec (fixed string length=6) start get_properties CPU time : 1.67073 secs emplace set sort CPU time : 0.474207 secs write stdout CPU time : 0.828457 secs total CPU time : 2.97343 secs total wall clock time : 3 secs [download]	[reply] [d/l]
Re^4: Rosetta Code: Long List is Long (faster - vec) by eyepopslikeamosquito (Archbishop) on Jan 09, 2023 at 11:55 UTC
Re^5: Rosetta Code: Long List is Long (faster - vec) by marioroy (Prior) on Jan 09, 2023 at 12:59 UTC
Re^3: Rosetta Code: Long List is Long (faster - vec - openmp) by eyepopslikeamosquito (Archbishop) on Jan 10, 2023 at 12:12 UTC
Thanks to the excellent feedback from marioroy I've been able to improve the code as follows: Compile `MAX_STR_LEN_L` version with `clang++`, otherwise `g++` For `MAX_STR_LEN_L` version, reduced size of short strings by one char by not null-terminating them Use good old `printf` to write to stdout (seems a bit faster than streaming to `std::cout`) Minor code cleanups (especially around creating the inverted `std::set`) New Timings for Long String Version > ./llil2vec big1.txt big2.txt big3.txt >f.tmp llil2vec start get_properties CPU time : 3.06708 secs emplace set sort CPU time : 0.903617 secs write stdout CPU time : 1.23459 secs total CPU time : 5.20544 secs total wall clock time : 5 secs > ./llil2vec big1.txt big2.txt big3.txt >f.tmp llil2vec start get_properties CPU time : 3.4832 secs emplace set sort CPU time : 1.10209 secs write stdout CPU time : 0.939805 secs total CPU time : 5.52519 secs total wall clock time : 6 secs > ./llil2vec big1.txt big2.txt big3.txt >f.tmp llil2vec start get_properties CPU time : 3.42605 secs emplace set sort CPU time : 0.916222 secs write stdout CPU time : 1.00049 secs total CPU time : 5.343 secs total wall clock time : 5 secs > diff f.tmp vec.tmp [download] This is an average time of 5.35 secs (compared to 5.5 secs in the original post). New Timings for Short String (`MAX_STR_LEN_L=6`) Version > For some reason, short string version is faster when compiled with c +lang++ > (while long string version is not) > clang++ -o llil2vec -std=c++11 -Wall -O3 llil2vec.cpp > ./llil2vec big1.txt big2.txt big3.txt >f.tmp llil2vec (fixed string length=6) start get_properties CPU time : 1.87217 secs emplace set sort CPU time : 0.589238 secs write stdout CPU time : 0.842179 secs total CPU time : 3.30369 secs total wall clock time : 3 secs > ./llil2vec big1.txt big2.txt big3.txt >f.tmp llil2vec (fixed string length=6) start get_properties CPU time : 1.95909 secs emplace set sort CPU time : 0.610479 secs write stdout CPU time : 0.959859 secs total CPU time : 3.52965 secs total wall clock time : 4 secs > ./llil2vec big1.txt big2.txt big3.txt >f.tmp llil2vec (fixed string length=6) start get_properties CPU time : 1.86549 secs emplace set sort CPU time : 0.608097 secs write stdout CPU time : 0.857176 secs total CPU time : 3.33087 secs total wall clock time : 3 secs > diff f.tmp vec.tmp [download] This is an average time of 3.4 secs (compared to 4.3 secs in the original post). New version of llil2vec.cpp Read more... (9 kB) Interim OpenMP Versions Note: The interim versions below are just early (conservative) openmp-safe versions made thread-safe by expunging the dreaded strtok function. The one with the most potential for multi-threading seems to be `llil3vec.cpp` because it uses vectors which can be safely used lock-free in thread-local `vec_loc` vectors, as shown by marioroy in llil4vec.cpp. While replacing `std::sort` and `std::stable_sort` with `__gnu_parallel` versions to sort `std::vectors` is a no-brainer, it seems problematic to parallelise access to a global `std::map` (as used in `llil3grt.cpp` and `llil3m.cpp` below) because I think you'd need some sort of locking to safely allow multiple threads update a shared global `std::map`. Ideas welcome. Interim version of llil3vec.cpp Read more... (9 kB) Timings of llil3vec > g++ -o llil3vec -std=c++20 -Wall -O3 llil3vec.cpp > time ./llil3vec big1.txt big2.txt big3.txt >f.tmp llil3vec (fixed string length=6) start get_properties CPU time : 1.47895 secs emplace set sort CPU time : 0.592783 secs write stdout CPU time : 0.832071 secs total CPU time : 2.90392 secs total wall clock time : 3 secs real 0m3.217s user 0m2.806s sys 0m0.412s > diff f.tmp vec.tmp > g++ -o llil3vec -std=c++20 -fopenmp -Wall -O3 llil3vec.cpp > time ./llil3vec big1.txt big2.txt big3.txt >f.tmp llil3vec (fixed string length=6) start get_properties CPU time : 2.5809 secs emplace set sort CPU time : 0.793355 secs write stdout CPU time : 0.860855 secs total CPU time : 4.23521 secs total wall clock time : 3 secs real 0m2.673s user 0m4.073s sys 0m0.465s > diff f.tmp vec.tmp [download] For the `-fopenmp` version note that the Unix real time is lower, but the user time is higher. Interim version of llil3grt.cpp Read more... (9 kB) Interim version of llil3m.cpp Read more... (11 kB) Timings `llil3grt (fixed string length=6) start - openmp version get_properties CPU time : 2.34487 secs emplace set sort CPU time : 0.942098 secs write stdout CPU time : 0.855157 secs total CPU time : 4.14223 secs total wall clock time : 4 secs real 0m4.752s user 0m4.241s sys 0m0.511s` [download] `$ time ./llil3m big1.txt big2.txt big3.txt >f.tmp llil3m (fixed string length=6) start - openmp version get_properties CPU time : 2.40089 secs vector copy CPU time : 0.544294 secs vector stable sort CPU time : 5.84981 secs write stdout CPU time : 0.805509 secs total CPU time : 9.6006 secs total wall clock : 4 secs real 0m4.713s user 0m9.238s sys 0m0.679s` [download] `llil3m (fixed string length=6) start - openmp version get_properties CPU time : 2.3031 secs vector copy CPU time : 0.544613 secs vector sort CPU time : 2.64047 secs write stdout CPU time : 0.791186 secs total CPU time : 6.27946 secs total wall clock : 4 secs real 0m4.182s user 0m6.128s sys 0m0.473s` [download] $ time ./llil3m big1.txt big2.txt big3.txt >f.tmp llil3m (fixed string length=6) start get_properties CPU time : 2.28072 secs vector copy CPU time : 0.471632 secs vector stable sort CPU time : 0.735042 secs write stdout CPU time : 0.67514 secs total CPU time : 4.16263 secs total wall clock : 4 secs real 0m4.470s user 0m4.159s sys 0m0.311s $ time ./llil3m big1.txt big2.txt big3.txt >f.tmp llil3m (fixed string length=6) start get_properties CPU time : 2.30618 secs vector copy CPU time : 0.473185 secs vector sort CPU time : 1.13081 secs write stdout CPU time : 0.668702 secs total CPU time : 4.57897 secs total wall clock : 4 secs real 0m4.889s user 0m4.558s sys 0m0.331s $ time ./llil3vec big1.txt big2.txt big3.txt >f.tmp llil3vec (fixed string length=6) start get_properties CPU time : 1.46864 secs emplace set sort CPU time : 0.630914 secs write stdout CPU time : 0.847912 secs total CPU time : 2.9476 secs total wall clock time : 3 secs real 0m3.233s user 0m2.852s sys 0m0.381s $ time ./llil3grt big1.txt big2.txt big3.txt >f.tmp llil3grt (fixed string length=6) start get_properties CPU time : 2.34418 secs emplace set sort CPU time : 0.901415 secs write stdout CPU time : 0.90025 secs total CPU time : 4.14595 secs total wall clock time : 5 secs real 0m4.784s user 0m4.232s sys 0m0.551s [download] References Added Later OpenMP Little Book Updated: The source code for `llil2vec.cpp` was updated after the node was first created based on feedback from the replies. Added interim version `llil3vec.cpp` (with some marioroy improvements added). Plus interim versions of `llil3grt.cpp` and `llil3m.cpp`. Added a few missing `#include <array>` and reference for OpenMP Little Book (thanks marioroy).	[reply] [d/l] [select]
Re^4: Rosetta Code: Long List is Long (faster - vec - fast_io) by marioroy (Prior) on Jan 10, 2023 at 17:25 UTC
Re^5: Rosetta Code: Long List is Long (faster - vec - fast_io) by eyepopslikeamosquito (Archbishop) on Jan 11, 2023 at 11:12 UTC
Some notes below your chosen depth have not been shown here
Re^4: Rosetta Code: Long List is Long (faster - llil4map/2 - OpenMP code) by marioroy (Prior) on Jan 17, 2023 at 14:28 UTC
Re^5: Rosetta Code: Long List is Long (faster - llil4map - OpenMP code) by eyepopslikeamosquito (Archbishop) on Feb 22, 2023 at 10:10 UTC
Some notes below your chosen depth have not been shown here
Re^3: Rosetta Code: Long List is Long (faster - vec)(faster++, and now parallel) by Anonymous Monk on Jan 10, 2023 at 17:45 UTC
The good news is... The bad news is... The good news is that I bring good news only! :) Modified J script is faster, more versatile, uses significantly less RAM, and has been tested with 9.04 engine to parallelize obvious low hanging fruits for additional speed boost. NB. ----------------------------------------------------------- NB. --- This file is "llil4.ijs" NB. --- Run as e.g.: NB. NB. jconsole.exe llil4.ijs big1.txt big2.txt big3.txt out.txt NB. NB. --- (NOTE: last arg is output filename, file is overwritten) NB. ----------------------------------------------------------- pattern =: 0 1 NB. ========> This line has a star in its right margin =======> NB. * args =: 2 }. ARGV fn_out =: {: args fn_in =: }: args NB. PAD_CHAR =: ' ' filter_CR =: #~ ~: & CR make_more_space =: ' ' I. @ ((LF = ]) +. (TAB = ])) } ] find_spaces =: I. @: = & ' ' read_file =: {{ 'fname pattern' =. y text =. make_more_space filter_CR fread fname selectors =. (\|.!.0 , {:) >: find_spaces text width =. # pattern height =. width <. @ %~ # selectors append_diffs =. }: , 2& (-~/\) shuffle_dims =. (1 0 3 & \|:) @ ((2, height, width, 1) & $) selectors =. append_diffs selectors selectors =. shuffle_dims selectors literal =. < @: (}:"1) @: (];. 0) & text "_1 numeric =. < @: (0&".) @: (; @: (<;. 0)) & text "_1 extract =. pattern & { using =. 1 & \ or_maybe =. ` ,(extract literal or_maybe numeric) using selectors }} read_many_files =: {{ 'fnames pattern' =. y ,&.>/"2 (-#pattern) ]\ ,(read_file @:(; &pattern)) "0 fnames NB. * }} 'words nums' =: read_many_files fn_in ; pattern t1 =: (6!:1) '' NB. time since engine start 'words nums' =: (~. words) ; words +//. nums NB. * 'words nums' =: (\: nums)& { &.:>"_1 words ; nums words =: ; nums < @ /:~/. words t2 =: (6!:1) '' NB. time since engine start text =: , words ,. TAB ,. (": ,. nums) ,. LF erase 'words' ; 'nums' text =: (#~ ~: & ' ') text text fwrite fn_out erase < 'text' t3 =: (6!:1) '' NB. time since engine start echo 'Read and parse input: ' , ": t1 echo 'Classify, sum, sort: ' , ": t2 - t1 echo 'Format and write output: ' , ": t3 - t2 echo 'Total time: ' , ": t3 echo '' echo 'Finished. Waiting for a key...' stdin '' exit 0 [download] Code above doesn't (yet) include any 9.04 features and runs OK with 9.03, but I found 9.04 slightly faster in general. I also found 9.04 a bit faster on Windows, it's opposite to what I have seen for 9.03 (script ran faster on Linux), let's shrug it off because of 9.04 beta status and/or my antique PC. Results below are for beta 9.04 on Windows 10 (RAM usage taken from Windows Task Manager): `> jconsole.exe llil4.ijs big1.txt big2.txt big3.txt out.txt Read and parse input: 1.501 Classify, sum, sort: 2.09 Format and write output: 1.318 Total time: 4.909 Finished. Waiting for a key... Peak working set (memory): 376,456K` [download] There are 3 star-marked lines. To patch for 9.04 new features to enable parallelization, replace them with these counterparts: `{{ for. i. 3 do. 0 T. 0 end. }} '' ,&.>/"2 (-#pattern) ]\ ,;(read_file @:(; &pattern)) t.'' "0 fnames 'words nums' =: (~.t.'' words) , words +//. t.'' nums` [download] As you see, 1st line replaces comment, 2nd and 3d lines require just minor touches. 2nd line launches reading and parsing of input files in parallel. 3d line says to parallelize filtering for unique words and summing numbers according to words classification. Kind of redundant double work, even as it was, as I see it. The 1st line starts 3 additional worker threads. I don't have more cores with my CPU anyway, and this script has no work easily dispatched to more workers. Then: `Read and parse input: 0.992 Classify, sum, sort: 1.849 Format and write output: 1.319 Total time: 4.16` [download] I would call my parallelization attempt, however crude it was, a success. Next is output for our second "official" dataset in this thread: `> jconsole.exe llil4.ijs long1.txt long2.txt long3.txt out.txt Read and parse input: 1.329 Classify, sum, sort: 0.149 Format and write output: 0.009 Total time: 1.487` [download] ######################################################## These are my results for latest C++ solution (compiled using g++), to compare my efforts to: $ ./llil2vec_11149482 big1.txt big2.txt big3.txt >vec.tmp llil2vec start get_properties CPU time : 3.41497 secs emplace set sort CPU time : 1.04229 secs write stdout CPU time : 1.31578 secs total CPU time : 5.77311 secs total wall clock time : 5 secs $ ./llil2vec_11149482 long1.txt long2.txt long3.txt >vec.tmp llil2vec start get_properties CPU time : 1.14889 secs emplace set sort CPU time : 0.057158 secs write stdout CPU time : 0.003307 secs total CPU time : 1.20943 secs total wall clock time : 2 secs $ ./llil2vec_11149482 big1.txt big2.txt big3.txt >vec.tmp llil2vec (fixed string length=6) start get_properties CPU time : 2.43187 secs emplace set sort CPU time : 0.853877 secs write stdout CPU time : 1.33636 secs total CPU time : 4.62217 secs total wall clock time : 5 secs [download] I noticed that new C++ code, supposed to be faster, is actually slower (compared to llil2grt) with "long" dataset from two "official" datasets used in this thread.	[reply] [d/l] [select]
Re^4: Rosetta Code: Long List is Long (faster - vec)(faster++, and now parallel) by eyepopslikeamosquito (Archbishop) on Jan 11, 2023 at 11:49 UTC
Re^4: Rosetta Code: Long List is Long (faster - vec)(faster++, and now parallel) by Anonymous Monk on Jan 10, 2023 at 17:54 UTC
Re^4: Rosetta Code: Long List is Long (outgunned?!) by Anonymous Monk on Jan 17, 2023 at 18:44 UTC
Re^5: Rosetta Code: Long List is Long (llil5.ijs vs llil4vec.cpp) by marioroy (Prior) on Jan 18, 2023 at 16:47 UTC
Re: Rosetta Code: Long List is Long - JudySL code by marioroy (Prior) on Jan 24, 2023 at 02:50 UTC
It has been a long journey. I'm completing my participation with a JudySL array implementation. shuffle.pl: The mini-shuffle script ensures random data for simulating worst case scenario. `#!/usr/bin/env perl use strict; use warnings; use List::Util 'shuffle'; my @arr = shuffle <>; print @arr;` [download] `./shuffle.pl big1.txt >tmp && mv tmp big1.txt ./shuffle.pl big2.txt >tmp && mv tmp big2.txt ./shuffle.pl big3.txt >tmp && mv tmp big3.txt ./shuffle.pl long1.txt >tmp && mv tmp long1.txt ./shuffle.pl long2.txt >tmp && mv tmp long2.txt ./shuffle.pl long3.txt >tmp && mv tmp long3.txt` [download] Running: The testing consumes one core for get_properties and parallel processing (8 threads) for sorting. The MAX_STR_LEN_L define is commented out. $ NUM_THREADS=1 ./llil4judy-no6 big{1,2,3}.txt long{1,2,3}.txt >out.tx +t llil4judy start use OpenMP use boost sort get properties 2.336 secs judy to vector 0.252 secs vector stable sort 0.566 secs write stdout 0.220 secs total time 3.375 secs $ NUM_THREADS=1 ./llil4map-no6-flat big{1,2,3}.txt long{1,2,3}.txt >ou +t.txt llil4map start use OpenMP use boost sort get properties 1.847 secs map to vector 0.198 secs vector stable sort 0.513 secs write stdout 0.224 secs total time 2.784 secs $ NUM_THREADS=1 ./llil4vec-no6 big{1,2,3}.txt long{1,2,3}.txt >out.txt llil4vec start use OpenMP use boost sort get properties 0.510 secs sort properties 1.046 secs vector reduce 0.266 secs vector stable sort 1.125 secs write stdout 0.283 secs total time 3.233 secs [download] The J script solution from Anonymous Monk. `~/j904/bin/jconsole llil5.ijs big{1,2,3}.txt long{1,2,3}.txt out.txt Read and parse input: 2.44621 Classify, sum, sort: 4.82435 Format and write output: 2.05906 Total time: 9.32962` [download] llil4judy.cc: Read more... (20 kB)	[reply] [d/l] [select]
Re^2: Rosetta Code: Long List is Long - JudySL summary by marioroy (Prior) on Jan 26, 2023 at 20:00 UTC
Note: Chuma states, "If it matters � the original lists are sorted, they don't all contain the same words, all numbers are positive integers." For clarity, the lists are sorted by (value descending, word). Links to file generators and llil4 variants: gen-llil.pl, gen-long-llil.pl, shuffle.pl llil4map, llil4map2, llil4hmap, llil4emh, llil4umap llil4vec, llil4vec-tbb llil4judy April 2024 updates: `llil4map2` is a memory efficient version, using a vector of pointers, to the `phmap` key-value pairs. This results in sorting taking longer, possibly due to CPU cache misses. However, memory consumption is significantly less. `200mil unique keys llil4map llil4map2 ------------------ -------- -------- fixed-length=12 ~ 8.0 GB 6.0 GB long-strings 17.8 GB 12.3 GB` [download] See also, a more memory efficient version, by Gregory Popovitch, author of the C++ parallel-hashmap library.	[reply] [d/l] [select]