Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris

Re^4: Challenge: 8 Letters, Most Words

by choroba (Bishop)
on Oct 04, 2013 at 21:57 UTC ( #1056942=note: print w/replies, xml ) Need Help??

in reply to Re^3: Challenge: 8 Letters, Most Words
in thread Challenge: 8 Letters, Most Words

I am getting somehow similar results. The difference is, I guess, that I initialize the count with the number of occurrences of the combination before adding the shorter words. The script runs for 3min26s, the output ends with
adeimrst: 310 aeilnrst: 312 adeinrst: 313 aeloprst: 313 aeilnpst: 314 adeoprst: 316 adeilrst: 319 aeimprst: 328 adeiprst: 332 aeilprst: 344 aeinprst: 346

The script:

#!/usr/bin/perl use warnings; use strict; use feature qw(say); my %lc_words; my $dict = shift; my $FH; open $FH, '<', $dict or $FH = *DATA; while (<$FH>) { chomp; next if length > 8; my $lc = lc; $lc_words{$lc} = 1; } say scalar keys %lc_words; my %sorted_count; for (keys %lc_words) { my @letters = sort split //; my $sorted = join q(), @letters; $sorted_count{$sorted}++; } say "$_: $sorted_count{$_}" for sort { $sorted_count{$a} <=> $sorted_count{$b} } keys %sorted_count; print '-' x 78, "\n"; my %summed = %sorted_count; for my $length (1 .. 7) { warn $length; for my $sorted (grep $length == length, keys %sorted_count) { my $regex = join '.*', split //, $sorted; for my $longer (grep $length < length, keys %sorted_count) { $summed{$longer} += $sorted_count{$sorted} if $longer =~ $ +regex; } } } say "$_: $summed{$_}" for sort { $summed{$a} <=> $summed{$b} } keys %summed; __DATA__ ffffffff fffffffa afffffff bfffffff fffffffb ffffbfff a aa aaa aaaa aaaaa aaaaaa

Update: Checking the result:

$ grep -E '^[aeinprst]{1,8}$' 2of12inf.txt | grep -vE '(.).*\1' | wc +-l 346

Update 2: Testing duplicate characters:

$ grep -E '(.).*\1.*:' 1056884.out | tail -n1 aeiprsst: 279 $ grep -E '^[aeiprsst]{1,8}$' 2of12inf.txt | grep -Ev '([aeiprt]).*\1| +s.*s.*s' | wc -l 279
لսႽ ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ

Replies are listed 'Best First'.
Re^5: Challenge: 8 Letters, Most Words
by Limbic~Region (Chancellor) on Oct 05, 2013 at 03:42 UTC
    I am bleary eye tired but I wanted to review your code because you mentioned it didn't consider all combinations - only possible combinations.

    If I have understood what you have done, this is not guaranteed to produce the correct answer on all dictionary files. It happens to get the right answer for 2of12inf.txt however. You appear to only be considering the combinations of letters of individual words rather than considering taking some from one word and some from another. Consider the following dictionary:

    abcd acdb adbc dabc bcad efgh fgeh hegf egfh fegh abcdxy efghlm
    The obvious answer is 'abcdefgh' which will get 10 words but your code chooses both 'abcdxy' and 'efghlm' with 6 words each.

    Cheers - L~R

      One should probably postprocess the results if the winner is shorter than 8. I will look into it later.
      لսႽ ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
        Something like the following? It slows the algorithm terribly, though...
        لսႽ ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1056942]
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others scrutinizing the Monastery: (8)
As of 2018-06-19 12:52 GMT
Find Nodes?
    Voting Booth?
    Should cpanminus be part of the standard Perl release?

    Results (114 votes). Check out past polls.