Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Re^2: Challenge: 8 Letters, Most Words

by hdb (Monsignor)
on Oct 05, 2013 at 14:11 UTC ( [id://1057037]=note: print w/replies, xml ) Need Help??


in reply to Re: Challenge: 8 Letters, Most Words
in thread Challenge: 8 Letters, Most Words

Cleaned up the code a bit:

use strict; use warnings; sub fits { my( $class, $subclass ) = @_; $class =~ s/$_// or return 0 for split //, $subclass; return 1; } open my $dict, "<", "2of12inf.txt" or die "Cannot open dictionary!\n"; my $words = do { local $/; <$dict> }; close $dict; my @words = sort { length($b) <=> length($a) } sort $words =~ /\b(\w{1 +,8})\b/g; my %classes; for my $w (@words) { my $fitted = 0; fits( $_, $w ) and $fitted = $classes{$_}{$w} = 1 for keys %classes; $classes{join '', sort split //, $w}{$w} = 1 unless $fitted; } my @sorted = sort { scalar keys %{$classes{$b}} <=> scalar keys %{$cla +sses{$a}} } keys %classes; print "$_: ", scalar keys %{$classes{$_}}, "\n" for @sorted;

So this is how it works:

  1. Reads the dictionary and sorts the words descending according to length, ie the 8-letter words first, then the 7-letter words etc. It also sorts alphabetically but that is not required.
  2. It then goes through the list of words and builds classes or sets of words indexed by their common letters. More specifically, it checks all classes created so far, whether there is one having all the letters needed for the next word.
  3. If yes, it adds it to all existing classes where it fits. It is important to add them not to one class but to all fitting classes.
  4. If not, it creates a new one.
  5. For this, it is important that long words are processed before short words.
  6. Whether it fits or not, is done by removing each letter of the new word from the letters of the class. If this is possible for all letters of the new word, then it fits and will be added to the class.
  7. After processing all words, it sorts the classes by number of members descending and prints them in that order.
It did run in 34 minutes. I'll be grateful for any hints on performance optimization.

Update: As I realized when reading Re^5: Challenge: 8 Letters, Most Words this solution will not be able to put two four letter words into one class irrespective of the letters. So in some situations it will not find the best solution.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1057037]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others taking refuge in the Monastery: (8)
As of 2024-04-25 11:28 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found