Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

Comment on

( #3333=superdoc: print w/ replies, xml ) Need Help??
All,
I have an idea for an exhaustive solution. I haven't convinced myself it will work yet but it will require multiple phases and at least one of the phases will need to be written in C. I will update when I can play even if it turns out not to be viable.

Update: Ok, I finally have something running that appears to work though I won't know for some time if it is correct. I ended up using 2 pieces of code. A short perl script and a C program that I will share despite the fact it is probably horrible.

Perl script. Used to filter the dictionary as well as determine the maximum number of times each letter is repeated in a word.

#!/usr/bin/perl use strict; use warnings; my %max; my %seen; open(my $fh, '<', '2of12inf.txt') or die "Unable to open '2of12inf.txt +' for reading: $!\n"; while (<$fh>) { chomp; $_ = lc($_); tr/a-z//cd; next if ! $_ || length($_) > 8 || $seen{$_}++; print "$_\n"; my %uniq; ++$uniq{$_} for split //; for (keys %uniq) { $max{$_} = $uniq{$_} if ! $max{$_} || $uniq{$_} > $max{$_}; } } for my $l (sort keys %max) { warn "$l:$max{$l}\n"; }

The C program. It uses a high water mark algorithm. To ensure it was working, I had it output every time the high water mark was reached or exceeded rather than waiting all the way until the end.

#include <stdio.h> #include <stdlib.h> // Program to count how many 8 character combinations are necessary fo +r brute force // Constants to remove magic numbers #define LINE_LEN 11 #define LAST_LETTER 25 #define CONDENSED 16 #define ASCII 97 // 'a' is 97 in ASCII #define DICTIONARY 40933 // Prototypes inline short int is_subset(short int *entry, short int *have); inline void populate_dictionary(short int *entry, char *line); int main (void) { short int best = 0; short int word[DICTIONARY][CONDENSED]; int idx = 0; static const char filename[] = "reduced.txt"; FILE *file = fopen (filename, "r"); if (file != NULL) { char line[LINE_LEN]; while (fgets(line, sizeof line, file) != NULL) { populate_dictionary(word[idx], line); ++idx; } fclose(file); } else { perror(filename); } //exit(0); static const short int max[26] = {4, 3, 3, 4, 4, 4, 4, 3, 3, 2, 3, + 4, 3, 4, 4, 3, 1, 4, 5, 3, 4, 2, 3, 2, 3, 4}; short int have[26] = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, + 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}; int i, j, k, l, m, n, o, p, q; short int subsets; // Evaluate all 12,461,993 possible 8 letter combinations for (i = 0; i <= LAST_LETTER; i++) { if (++have[i] > max[i]) { --have[i]; continue; } for (j = i; j <= LAST_LETTER; j++) { if (++have[j] > max[j]) { --have[j]; continue; } for (k = j; k <= LAST_LETTER; k++) { if (++have[k] > max[k]) { --have[k]; continue; } for (l = k; l <= LAST_LETTER; l++) { if (++have[l] > max[l]) { --have[l]; continue; } for (m = l; m <= LAST_LETTER; m++) { if (++have[m] > max[m]) { --have[m]; continue; } for (n = m; n <= LAST_LETTER; n++) { if (++have[n] > max[n]) { --have[n]; continue; } for (o = n; o <= LAST_LETTER; o++) { if (++have[o] > max[o]) { --have[o]; continue; } for (p = o; p <= LAST_LETTER; p++) { if (++have[p] > max[p]) { --have[p]; continue; } subsets = 0; for (idx = 0; idx < DICTIONARY; id +x++) { if (is_subset(word[idx], have) +) { ++subsets; } } if (subsets >= best) { best = subsets; for (idx = 0; idx <= LAST_LETT +ER; idx++) { for (q = 1; q <= have[idx] +; q++) { printf("%c", idx + ASC +II); } } printf(" = %i\n", best); } --have[p]; } --have[o]; } --have[n]; } --have[m]; } --have[l]; } --have[k]; } --have[j]; } --have[i]; } return 0; } inline short int is_subset(short int *entry, short int *have) { short int i; for (i = 0; i < CONDENSED; i = i + 2) { if (entry[i] == -1) { return 1; } if (entry[i + 1] > have[entry[i]]) { return 0; } } return 1; } inline void populate_dictionary(short int *entry, char *line) { short int letter, i, j; // The max word length is 8 and there are 2 pairs for each possibl +e letter // This makes 16 slots // letter,count letter, count letter, count, etc for (i = 0; i < CONDENSED; i = i + 2) { entry[i] = -1; // letter entry[i + 1] = 0; // count } // Count how many of each letter are in the word for (i = 0; line[i] != '\n'; i++) { letter = line[i] - ASCII; for (j = 0; j < CONDENSED; j = j + 2) { if (entry[j] == -1) { entry[j] = letter; ++entry[j + 1]; break; } else if (entry[j] == letter) { ++entry[j + 1]; break; } } } }

I am sure there is a much better way of doing this. I will need to achieve over 5 million loops per second to achieve my goal of being completed within 24 hours. I kicked it off at 10:28PM EST. I will update again when it is completed.

Update 2: It only took 12 minutes to run which worries me, but here is the output (winner at the bottom):

aaaabbbc = 2 aaaabbbd = 4 aaaabbbe = 5 aaaabbbh = 5 aaaabbbl = 5 aaaabbby = 6 aaaabbcd = 6 aaaabbce = 7 aaaabbcl = 9 aaaabbcs = 9 aaaabbcy = 9 aaaabbde = 15 aaaabber = 16 aaaabbet = 18 aaaabcds = 19 aaaabcer = 22 aaaabcls = 25 aaaabdel = 29 aaaabder = 36 aaaabest = 42 aaaaeprs = 47 aaabbest = 48 aaabcder = 57 aaabcers = 63 aaabdegr = 64 aaabdelr = 65 aaabdels = 68 aaabdeor = 68 aaabders = 80 aaabelst = 94 aaaelpst = 109 aaaeprst = 118 aabcders = 129 aabcerst = 150 aabdeirs = 155 aabdelst = 158 aabderst = 164 aabelrst = 169 aaceprst = 197 aaeiprst = 209 abcdeirs = 215 abcdeors = 215 abcderst = 227 abcehrst = 231 abceorst = 231 abceprst = 234 abcerstu = 235 abdehrst = 241 abdeilrs = 264 abdeirst = 284 acehprst = 288 aceiprst = 304 aceoprst = 307 adeilrst = 319 adeiprst = 332 aeilprst = 344 aeinprst = 346

Cheers - L~R


In reply to Re: Challenge: 8 Letters, Most Words by Limbic~Region
in thread Challenge: 8 Letters, Most Words by Limbic~Region

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?
    Username:
    Password:

    What's my password?
    Create A New User
    Chatterbox?
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others chanting in the Monastery: (4)
    As of 2015-07-30 02:49 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?

      The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









      Results (269 votes), past polls