Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

Re: (FASTER) Example of TMTOWTDI

by gmax (Abbot)
on Mar 22, 2002 at 10:01 UTC ( [id://153525]=note: print w/replies, xml ) Need Help??


in reply to Re: Extreme Example of TMTOWTDI
in thread Extreme Example of TMTOWTDI

Maybe it's an even dirtier hack, but this one can speed up things a lot.
Your script saves time by skipping chomp. My addition will save time by avoiding a hidden join for each array interpolation. I found this trick when I was benchmarking anagram algorithms.
I used a 116_246 words dictionary, and I got a significative speed improvement, as you can see from the relative times.
$ wc words.english 116246 116246 989075 words.english $ time perl dict.pl words.english # first hack 1.38 user 0.05 system 0:01.43 elapsed $ time perl dict2.pl words.english # second hack 0.86 user 0.02 system 0:00.88 elapsed
Change
push @{$Word{length$_}},$_ while (<>); # ... print OUT @{$Word{$_}};
into
$Word{length$_} .= $_ while (<>); # ... print OUT $Word{$_};

BTW: ChOas, congratulations for your sanctification!

update
There is always room for improvement, though. ...
Combining the first array solution with the other tricks, we have a further 10% improvement.

#!/usr/bin/perl -w use strict; my @words; my $file; $words[length$_] .= $_ while (<>); for (0..$#words) { next unless $words[$_]; $file=$_-1 . ".mine"; open OUT,">$file" or die "Cannot open $file.mine: $!\n"; print OUT $words[$_]; close OUT; };
update (2)
I am afraid that I must have spoiled demerphq's comparison with my previous update. ;)
He very chivalrously included such update and compared the methods shown so far, offering an improvement, that was unfortunately slower than this array + concatenation hack.
Nice Job. Keep up the good work, bro, and many thanks for your analysis.
 _  _ _  _  
(_|| | |(_|><
 _|   

Replies are listed 'Best First'.
Re: Re: (FASTER) Example of TMTOWTDI
by demerphq (Chancellor) on Mar 22, 2002 at 11:53 UTC
    Hmm. Well I have to admit this thread was a bit of a suprise.

    I figured the overhead of loading the thing into memory and then spitting it out was unecessary so I wrote a variant that opens the files first and then writes to the apropriate one for each line. Trouble was it wasnt faster. :( So after a bit of tweaking and some less than pretty (imo) code I managed to soup it up to be the same speed or just slightly faster than your concatenation version. (Unfortunately in the interest of fairness I applied those optimizations that i could to your variant and once done its faster than my version. :-)

    Incidentally I used a dictionary of 320k words, ranging from 2 to 29 chars. Im pretty sure that under some memory sizes the approach I took would be more appropriate than the other two (and vice versa that on some OS's the approach would be impractical due to restrictions on the number of filehandles allowed), but I was suprised that it wasnt faster to start off with. It seems the overhead of switching between filehandles is higher than I thought. <super>

    Benchmark: timing 10 iterations of ch0as, demerphq, gmax, gmax_dem...
         ch0as: 45 wallclock secs (31.86 usr +  1.86 sys = 33.72 CPU) @  0.30/s (n=10)
      demerphq: 23 wallclock secs (19.05 usr +  2.26 sys = 21.31 CPU) @  0.47/s (n=10)
          gmax: 27 wallclock secs (21.03 usr +  1.94 sys = 22.97 CPU) @  0.44/s (n=10)
      gmax_dem: 25 wallclock secs (18.16 usr +  1.61 sys = 19.77 CPU) @  0.51/s (n=10)
             s/iter    ch0as     gmax demerphq gmax_dem
    ch0as      3.37       --     -32%     -37%     -41%
    gmax       2.30      47%       --      -7%     -14%
    demerphq   2.13      58%       8%       --      -7%
    gmax_dem   1.98      71%      16%       8%       --
    </super> Oh and I made some slight modifications to your and ch0as's snippets, such as not using global filehandles and using a better formatted filename. These are consistent across the variants so they shouldnt have compromised the benchmark in any way.

    BTW: I just completed a crossword/scrabble solving engine in perl. One of the avenues that I pursued was storing words based on their length. This strategy turned out to be inappropriate. I ended up using a custom hashing algorithm where the hashed value had semantic meaning about the words it matched. This enabled me to rule out 10s-1000s of words with one binary and. (I ended up getting a turn of a scrabble game down to around 2-3 secs.) If anybodys interested in it let me know.

    Heres the code I benchmarked:

    #!/usr/bin/perl -w use strict; use Benchmark 'cmpthese'; sub demerphq { open my $IN,"words.txt" or die "words.txt:$!"; my $outf; my @fh; my $nl_len=length("\n"); foreach (1..35) { open $fh[$_+$nl_len],sprintf(">./len/%02d.demerphq",$_) or die + "$_.demerphq:$!"; } print {$fh[length $_]} $_ while <$IN>; $_ && close $_ foreach @fh; foreach (1..35) { unlink sprintf("./len/%02d.demerphq",$_) unless -s sprintf("./ +len/%02d.demerphq",$_); } close $IN; } sub ch0as { my %Word; my $File; open my $IN,"words.txt" or die "words.txt:$!"; push @{$Word{length$_}},$_ while (<$IN>); for (keys %Word) { $File=sprintf("./len/%02d.ch0as",$_-1); open my $OUT,">$File" or die "Cannot open $File: $!\n"; print $OUT @{$Word{$_}}; close $OUT; }; } sub gmax { my %Word; my $File; open my $IN,"words.txt" or die "words.txt:$!"; $Word{length$_} .= $_ while (<$IN>); for (keys %Word) { $File=sprintf("./len/%02d.gmax",$_-1); open my $OUT,">$File" or die "Cannot open $File: $!\n"; print $OUT $Word{$_}; close $OUT; }; } sub gmax_dem { my @Word; my $File; open my $IN,"words.txt" or die "words.txt:$!"; $Word[length$_] .= $_ while (<$IN>); for (0..$#Word) { next unless $Word[$_]; $File=sprintf("./len/%02d.gmax.dem",$_-1); open my $OUT,">$File" or die "Cannot open $File: $!\n"; print $OUT $Word[$_]; close $OUT; }; } cmpthese(10,{ demerphq => \&demerphq, gmax => \&gmax, gmax_dem => \&gmax_dem, ch0as => \&ch0as });

    Yves / DeMerphq
    ---
    Writing a good benchmark isnt as easy as it might look.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://153525]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others browsing the Monastery: (3)
As of 2024-04-19 21:37 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found