Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number

Comment on

( #3333=superdoc: print w/replies, xml ) Need Help??

Cleaned up the code a bit:

use strict; use warnings; sub fits { my( $class, $subclass ) = @_; $class =~ s/$_// or return 0 for split //, $subclass; return 1; } open my $dict, "<", "2of12inf.txt" or die "Cannot open dictionary!\n"; my $words = do { local $/; <$dict> }; close $dict; my @words = sort { length($b) <=> length($a) } sort $words =~ /\b(\w{1 +,8})\b/g; my %classes; for my $w (@words) { my $fitted = 0; fits( $_, $w ) and $fitted = $classes{$_}{$w} = 1 for keys %classes; $classes{join '', sort split //, $w}{$w} = 1 unless $fitted; } my @sorted = sort { scalar keys %{$classes{$b}} <=> scalar keys %{$cla +sses{$a}} } keys %classes; print "$_: ", scalar keys %{$classes{$_}}, "\n" for @sorted;

So this is how it works:

  1. Reads the dictionary and sorts the words descending according to length, ie the 8-letter words first, then the 7-letter words etc. It also sorts alphabetically but that is not required.
  2. It then goes through the list of words and builds classes or sets of words indexed by their common letters. More specifically, it checks all classes created so far, whether there is one having all the letters needed for the next word.
  3. If yes, it adds it to all existing classes where it fits. It is important to add them not to one class but to all fitting classes.
  4. If not, it creates a new one.
  5. For this, it is important that long words are processed before short words.
  6. Whether it fits or not, is done by removing each letter of the new word from the letters of the class. If this is possible for all letters of the new word, then it fits and will be added to the class.
  7. After processing all words, it sorts the classes by number of members descending and prints them in that order.
It did run in 34 minutes. I'll be grateful for any hints on performance optimization.

Update: As I realized when reading Re^5: Challenge: 8 Letters, Most Words this solution will not be able to put two four letter words into one class irrespective of the letters. So in some situations it will not find the best solution.

In reply to Re^2: Challenge: 8 Letters, Most Words by hdb
in thread Challenge: 8 Letters, Most Words by Limbic~Region

Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":

  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?

    What's my password?
    Create A New User
    and all is quiet...

    How do I use this? | Other CB clients
    Other Users?
    Others having an uproarious good time at the Monastery: (1)
    As of 2018-04-24 00:13 GMT
    Find Nodes?
      Voting Booth?