Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Re^2: Challenge: 8 Letters, Most Words (2of12int.txt in 20^H^H 10 secs; Pure Perl)

by hdb (Prior)
on Oct 09, 2013 at 09:40 UTC ( #1057521=note: print w/ replies, xml ) Need Help??


in reply to Re: Challenge: 8 Letters, Most Words (2of12int.txt in 20^H^H 10 secs; Pure Perl)
in thread Challenge: 8 Letters, Most Words

It is quite impressive. I have not yet understood how it works. It seems not to work on the following dictionary:

exactest one two three four five six seven eight nine ten

where one solution is eehnortw covering four words but your script says [1] aceesttx : exactest.

Update: After further study it looks to me, that you are checking all 8 letter classes derived from the 8 letter words in the dictionary and see how many words are captured. Which is probably giving you the correct solution for many real world dictionaries. This way, you do avoid the combinatorial explosion that makes this challenge difficult...

Still quite impressive!


Comment on Re^2: Challenge: 8 Letters, Most Words (2of12int.txt in 20^H^H 10 secs; Pure Perl)
Select or Download Code
Re^3: Challenge: 8 Letters, Most Words (2of12int.txt in 20^H^H 10 secs; Pure Perl)
by LanX (Canon) on Oct 09, 2013 at 12:20 UTC

      Even if it is not guaranteed to deliver the correct solution, it is so much faster than my attempt that also gives the same answer. My brute force appraoch already runs for more than 24 hours...

        > My brute force appraoch already runs for more than 24 hours...

        to calculate the dictionary word with the highest coverage??? o_O

        I'm pretty sure that we had that already.

        Cheers Rolf

        ( addicted to the Perl Programming Language)

Re^3: Challenge: 8 Letters, Most Words (2of12int.txt in 20^H^H 10 secs; Pure Perl)
by BrowserUk (Pope) on Oct 09, 2013 at 16:37 UTC

    It does recurse through all 8-letter combos, but short-circuits at each level if the combination so far cannot be derived from the dictionary supplied.

    next unless exists $tree->{ $_ };

    If the letter combination for your 'eehnortw' existed in the dictionary, it would be found very quickly:

    [17:25:23.81] C:\test>type hdb.dict exactest one two three four five six seven eight nine ten torewhen [17:25:47.91] C:\test>1056884-calc hdb.dict 0.00380706787109375 [5] eehnortw : torewhen three one two ten [1] aceesttx : exactest

    Comment out the short-circuit and it will consider all 8-letter possibilities .. and run more slowly.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
Re^3: Challenge: 8 Letters, Most Words (2of12int.txt in 20^H^H 10 secs; Pure Perl)
by BrowserUk (Pope) on Oct 09, 2013 at 18:47 UTC

    There are 25 letter sets that will cover 4 words of your sample dictionary:

    [19:35:10.42] C:\test>1056884-calc hdb.dict 2aaaaenot 3aaaenotw 4aeinnotw 373.12408208847 [4] einnootw : one two nine ten [4] efinnotv : five one nine ten [4] einnotvw : one two nine ten [4] eenostvw : one two seven ten [4] einnotuw : one two nine ten [4] einnostw : one two nine ten [4] eghinnot : one eight nine ten [4] ceinnotw : one two nine ten [4] eehnortw : three one two ten [4] efinnotw : one two nine ten [4] efinotvw : five one two ten [4] eiinnotw : one two nine ten [4] eeinnotw : one two nine ten [4] eghinotw : one two eight ten [4] ehinnotw : one two nine ten [4] einnortw : one two nine ten [4] einnotwx : one two nine ten [4] einostwx : six one two ten [4] einnnotw : one two nine ten [4] eginnotw : one two nine ten [4] efnortuw : one two four ten [4] einnotww : one two nine ten [4] aeinnotw : one two nine ten [4] einnottw : one two nine ten [4] einnostx : six one nine ten [3] eenostwx : one two ten ...

    375 seconds isn't as impressive, but better than 24 hours :)

    This makes use of another optimisation that only benefits when the dictionary is small like yours:

    #! perl -slw use strict; use Time::HiRes qw[ time ]; $|++; sub uniq{ my %x; @x{@_} = (); keys %x } my $start = time; my @lookup = map{ my $bits = pack 'C', $_; [ grep vec( $bits, $_, 1 ), 0 .. 7 ] } 0 .. 255; my %dict; my %alphabet; while( <> ) { chomp; next if length > 8; my $r = \%dict; undef( $alphabet{ $_ } ), $r = $r->{ $_ } //= {} for sort split '' +, $_; push @{ $r->{_} }, $_; } my @alphabet = sort keys %alphabet; my $best = [ 0, '' ]; my %stats; sub X { my( $first, $soFar, $tree ) = @_; if( @$soFar == 8 ) { my @words = uniq map { my $r = \%dict; $r = $r->{ $_ } for @{ $soFar }[ @{ $lookup[ $_ ] } ]; exists $r->{_} ? @{ $r->{_} } : (); } 0 .. 255; return unless @words > 1; print @{ $best = [ scalar @words, join '', @$soFar ] } if @wor +ds > $best->[0]; $stats{ join '', @$soFar } = \@words; return; } for( grep $_ ge $first, @alphabet ) { # next unless exists $tree->{ $_ }; X( $_, [ @$soFar, $_ ], $tree->{ $_ } ); } return; } X( 'a', [], \%dict ); print time - $start; <STDIN>; printf "[%d] %s : @{[ @{$stats{ $_ }} ]}\n", scalar @{$stats{$_}}, $_ +for sort{ @{ $stats{ $b } } <=> @{ $stats{ $a } } } keys %stats;

    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
    @$soFar, $_

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1057521]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others lurking in the Monastery: (13)
As of 2014-12-18 12:43 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (51 votes), past polls