http://www.perlmonks.org?node_id=153538


in reply to Re: (FASTER) Example of TMTOWTDI
in thread Extreme Example of TMTOWTDI

Hmm. Well I have to admit this thread was a bit of a suprise.

I figured the overhead of loading the thing into memory and then spitting it out was unecessary so I wrote a variant that opens the files first and then writes to the apropriate one for each line. Trouble was it wasnt faster. :( So after a bit of tweaking and some less than pretty (imo) code I managed to soup it up to be the same speed or just slightly faster than your concatenation version. (Unfortunately in the interest of fairness I applied those optimizations that i could to your variant and once done its faster than my version. :-)

Incidentally I used a dictionary of 320k words, ranging from 2 to 29 chars. Im pretty sure that under some memory sizes the approach I took would be more appropriate than the other two (and vice versa that on some OS's the approach would be impractical due to restrictions on the number of filehandles allowed), but I was suprised that it wasnt faster to start off with. It seems the overhead of switching between filehandles is higher than I thought. <super>

Benchmark: timing 10 iterations of ch0as, demerphq, gmax, gmax_dem...
     ch0as: 45 wallclock secs (31.86 usr +  1.86 sys = 33.72 CPU) @  0.30/s (n=10)
  demerphq: 23 wallclock secs (19.05 usr +  2.26 sys = 21.31 CPU) @  0.47/s (n=10)
      gmax: 27 wallclock secs (21.03 usr +  1.94 sys = 22.97 CPU) @  0.44/s (n=10)
  gmax_dem: 25 wallclock secs (18.16 usr +  1.61 sys = 19.77 CPU) @  0.51/s (n=10)
         s/iter    ch0as     gmax demerphq gmax_dem
ch0as      3.37       --     -32%     -37%     -41%
gmax       2.30      47%       --      -7%     -14%
demerphq   2.13      58%       8%       --      -7%
gmax_dem   1.98      71%      16%       8%       --
</super> Oh and I made some slight modifications to your and ch0as's snippets, such as not using global filehandles and using a better formatted filename. These are consistent across the variants so they shouldnt have compromised the benchmark in any way.

BTW: I just completed a crossword/scrabble solving engine in perl. One of the avenues that I pursued was storing words based on their length. This strategy turned out to be inappropriate. I ended up using a custom hashing algorithm where the hashed value had semantic meaning about the words it matched. This enabled me to rule out 10s-1000s of words with one binary and. (I ended up getting a turn of a scrabble game down to around 2-3 secs.) If anybodys interested in it let me know.

Heres the code I benchmarked:

#!/usr/bin/perl -w use strict; use Benchmark 'cmpthese'; sub demerphq { open my $IN,"words.txt" or die "words.txt:$!"; my $outf; my @fh; my $nl_len=length("\n"); foreach (1..35) { open $fh[$_+$nl_len],sprintf(">./len/%02d.demerphq",$_) or die + "$_.demerphq:$!"; } print {$fh[length $_]} $_ while <$IN>; $_ && close $_ foreach @fh; foreach (1..35) { unlink sprintf("./len/%02d.demerphq",$_) unless -s sprintf("./ +len/%02d.demerphq",$_); } close $IN; } sub ch0as { my %Word; my $File; open my $IN,"words.txt" or die "words.txt:$!"; push @{$Word{length$_}},$_ while (<$IN>); for (keys %Word) { $File=sprintf("./len/%02d.ch0as",$_-1); open my $OUT,">$File" or die "Cannot open $File: $!\n"; print $OUT @{$Word{$_}}; close $OUT; }; } sub gmax { my %Word; my $File; open my $IN,"words.txt" or die "words.txt:$!"; $Word{length$_} .= $_ while (<$IN>); for (keys %Word) { $File=sprintf("./len/%02d.gmax",$_-1); open my $OUT,">$File" or die "Cannot open $File: $!\n"; print $OUT $Word{$_}; close $OUT; }; } sub gmax_dem { my @Word; my $File; open my $IN,"words.txt" or die "words.txt:$!"; $Word[length$_] .= $_ while (<$IN>); for (0..$#Word) { next unless $Word[$_]; $File=sprintf("./len/%02d.gmax.dem",$_-1); open my $OUT,">$File" or die "Cannot open $File: $!\n"; print $OUT $Word[$_]; close $OUT; }; } cmpthese(10,{ demerphq => \&demerphq, gmax => \&gmax, gmax_dem => \&gmax_dem, ch0as => \&ch0as });

Yves / DeMerphq
---
Writing a good benchmark isnt as easy as it might look.