http://www.perlmonks.org?node_id=1057037


in reply to Re: Challenge: 8 Letters, Most Words
in thread Challenge: 8 Letters, Most Words

Cleaned up the code a bit:

use strict; use warnings; sub fits { my( $class, $subclass ) = @_; $class =~ s/$_// or return 0 for split //, $subclass; return 1; } open my $dict, "<", "2of12inf.txt" or die "Cannot open dictionary!\n"; my $words = do { local $/; <$dict> }; close $dict; my @words = sort { length($b) <=> length($a) } sort $words =~ /\b(\w{1 +,8})\b/g; my %classes; for my $w (@words) { my $fitted = 0; fits( $_, $w ) and $fitted = $classes{$_}{$w} = 1 for keys %classes; $classes{join '', sort split //, $w}{$w} = 1 unless $fitted; } my @sorted = sort { scalar keys %{$classes{$b}} <=> scalar keys %{$cla +sses{$a}} } keys %classes; print "$_: ", scalar keys %{$classes{$_}}, "\n" for @sorted;

So this is how it works:

  1. Reads the dictionary and sorts the words descending according to length, ie the 8-letter words first, then the 7-letter words etc. It also sorts alphabetically but that is not required.
  2. It then goes through the list of words and builds classes or sets of words indexed by their common letters. More specifically, it checks all classes created so far, whether there is one having all the letters needed for the next word.
  3. If yes, it adds it to all existing classes where it fits. It is important to add them not to one class but to all fitting classes.
  4. If not, it creates a new one.
  5. For this, it is important that long words are processed before short words.
  6. Whether it fits or not, is done by removing each letter of the new word from the letters of the class. If this is possible for all letters of the new word, then it fits and will be added to the class.
  7. After processing all words, it sorts the classes by number of members descending and prints them in that order.
It did run in 34 minutes. I'll be grateful for any hints on performance optimization.

Update: As I realized when reading Re^5: Challenge: 8 Letters, Most Words this solution will not be able to put two four letter words into one class irrespective of the letters. So in some situations it will not find the best solution.