Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid

comment on

( #3333=superdoc: print w/replies, xml ) Need Help??
Hmm. Well I have to admit this thread was a bit of a suprise.

I figured the overhead of loading the thing into memory and then spitting it out was unecessary so I wrote a variant that opens the files first and then writes to the apropriate one for each line. Trouble was it wasnt faster. :( So after a bit of tweaking and some less than pretty (imo) code I managed to soup it up to be the same speed or just slightly faster than your concatenation version. (Unfortunately in the interest of fairness I applied those optimizations that i could to your variant and once done its faster than my version. :-)

Incidentally I used a dictionary of 320k words, ranging from 2 to 29 chars. Im pretty sure that under some memory sizes the approach I took would be more appropriate than the other two (and vice versa that on some OS's the approach would be impractical due to restrictions on the number of filehandles allowed), but I was suprised that it wasnt faster to start off with. It seems the overhead of switching between filehandles is higher than I thought. <super>

Benchmark: timing 10 iterations of ch0as, demerphq, gmax, gmax_dem...
     ch0as: 45 wallclock secs (31.86 usr +  1.86 sys = 33.72 CPU) @  0.30/s (n=10)
  demerphq: 23 wallclock secs (19.05 usr +  2.26 sys = 21.31 CPU) @  0.47/s (n=10)
      gmax: 27 wallclock secs (21.03 usr +  1.94 sys = 22.97 CPU) @  0.44/s (n=10)
  gmax_dem: 25 wallclock secs (18.16 usr +  1.61 sys = 19.77 CPU) @  0.51/s (n=10)
         s/iter    ch0as     gmax demerphq gmax_dem
ch0as      3.37       --     -32%     -37%     -41%
gmax       2.30      47%       --      -7%     -14%
demerphq   2.13      58%       8%       --      -7%
gmax_dem   1.98      71%      16%       8%       --
</super> Oh and I made some slight modifications to your and ch0as's snippets, such as not using global filehandles and using a better formatted filename. These are consistent across the variants so they shouldnt have compromised the benchmark in any way.

BTW: I just completed a crossword/scrabble solving engine in perl. One of the avenues that I pursued was storing words based on their length. This strategy turned out to be inappropriate. I ended up using a custom hashing algorithm where the hashed value had semantic meaning about the words it matched. This enabled me to rule out 10s-1000s of words with one binary and. (I ended up getting a turn of a scrabble game down to around 2-3 secs.) If anybodys interested in it let me know.

Heres the code I benchmarked:

#!/usr/bin/perl -w use strict; use Benchmark 'cmpthese'; sub demerphq { open my $IN,"words.txt" or die "words.txt:$!"; my $outf; my @fh; my $nl_len=length("\n"); foreach (1..35) { open $fh[$_+$nl_len],sprintf(">./len/%02d.demerphq",$_) or die + "$_.demerphq:$!"; } print {$fh[length $_]} $_ while <$IN>; $_ && close $_ foreach @fh; foreach (1..35) { unlink sprintf("./len/%02d.demerphq",$_) unless -s sprintf("./ +len/%02d.demerphq",$_); } close $IN; } sub ch0as { my %Word; my $File; open my $IN,"words.txt" or die "words.txt:$!"; push @{$Word{length$_}},$_ while (<$IN>); for (keys %Word) { $File=sprintf("./len/%02d.ch0as",$_-1); open my $OUT,">$File" or die "Cannot open $File: $!\n"; print $OUT @{$Word{$_}}; close $OUT; }; } sub gmax { my %Word; my $File; open my $IN,"words.txt" or die "words.txt:$!"; $Word{length$_} .= $_ while (<$IN>); for (keys %Word) { $File=sprintf("./len/%02d.gmax",$_-1); open my $OUT,">$File" or die "Cannot open $File: $!\n"; print $OUT $Word{$_}; close $OUT; }; } sub gmax_dem { my @Word; my $File; open my $IN,"words.txt" or die "words.txt:$!"; $Word[length$_] .= $_ while (<$IN>); for (0..$#Word) { next unless $Word[$_]; $File=sprintf("./len/%02d.gmax.dem",$_-1); open my $OUT,">$File" or die "Cannot open $File: $!\n"; print $OUT $Word[$_]; close $OUT; }; } cmpthese(10,{ demerphq => \&demerphq, gmax => \&gmax, gmax_dem => \&gmax_dem, ch0as => \&ch0as });

Yves / DeMerphq
Writing a good benchmark isnt as easy as it might look.

In reply to Re: Re: (FASTER) Example of TMTOWTDI by demerphq
in thread Extreme Example of TMTOWTDI by Cody Pendant

Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":

  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?

    What's my password?
    Create A New User
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others meditating upon the Monastery: (7)
    As of 2020-02-21 19:03 GMT
    Find Nodes?
      Voting Booth?
      What numbers are you going to focus on primarily in 2020?

      Results (97 votes). Check out past polls.