Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

Another word puzzle with too many permutations

by sarchasm (Acolyte)
on Oct 15, 2013 at 16:38 UTC ( [id://1058322]=perlquestion: print w/replies, xml ) Need Help??

sarchasm has asked for the wisdom of the Perl Monks concerning the following question:

I've been working on some word puzzles where you are given a list of items (Cars, People, Countries, etc...) and asked to identify a subset of those items that are comprised of a given amount of letters. A simple example could be:

Word list: TOYOTA HONDA AUDI FORD Find: 2 words Given: A(1),D(2),F(1),I(1),O(1),R(1),U(1) Solution: AUDI FORD

Small lists can be done by hand with some trial and error but large lists present a huge challenge. I'm trying to figure out how to create a process to solve one of these puzzles given a list of 100 words and asked to find 10 items.

I have a list of 100 dogs and I am given a list of letters that make up 10 of the dog names. I tried to create a process that would identify all of the potential permutations r(10) from n(100) but with 5.35983370e+20 permutations, a linear approach makes this virtually impossible. Still, I gave it shot using Math::Combinatorics to calculate permutations and added a filter to only output records that have the same amount of letters as the given pool (112):

A(8),B(2),C(5),8(D),7(E),F(1),G(3),H(9),I(8),J(0),K(1),L(9),M(1),N(12) +,O(13),P(1),Q(0),R(5),S(6),T(4),U(6),V(0),W(1),X(1),Y(0),Z(1)

This approach just doesn't work due to the number of calculations required. I am hoping the monks have a more elegant solution or at least some ideas for me to explore. My attempt at a solution follows:

use Math::Combinatorics; open (DOGS, '>>Dogs.txt'); select DOGS; $| = 1; my $counter = 0; my @n = qw(Affenpinscher AfghanHound AiredaleTerrier Akita AlaskanMa +lamute AmericanEnglishCoonhound AmericanEskimoDog AmericanFoxhound Ba +senji BassetHound Beagle BeardedCollie Beauceron BlackandTanCoonhound + Bloodhound BluetickCoonhound BorderCollie BorderTerrier Borzoi Bosto +nTerrier BouvierdesFlandres Boxer BoykinSpaniel Briard Brittany Bruss +elsGriffon BullTerrier Bulldog Bullmastiff ChesapeakeBayRetriever Chi +huahua ChineseCrested ChineseShar-Pei Chinook ChowChow ClumberSpaniel + CockerSpaniel Collie Curly-CoatedRetriever Dachshund Dalmatian Dandi +eDinmontTerrier DobermanPinscher DoguedeBordeaux EnglishCockerSpaniel + EnglishFoxhound EnglishSetter EnglishSpringerSpaniel GermanPinscher +GermanShepherdDog GermanShorthairedPointer GermanWirehairedPointer Gi +antSchnauzer GlenofImaalTerrier GoldenRetriever GordonSetter GreatDan +e Greyhound Harrier IbizanHound IcelandicSheepdog IrishSetter IrishTe +rrier Kuvasz LabradorRetriever LakelandTerrier LhasaApso Lowchen Malt +ese ManchesterTerrier Mastiff MiniatureBullTerrier MiniaturePinscher +MiniatureSchnauzer NeapolitanMastiff Newfoundland NorfolkTerrier Peki +ngese PembrokeWelshCorgi PharaohHound Plott PortugueseWaterDog Pug Pu +li PyreneanShepherd RatTerrier ShibaInu ShihTzu SiberianHusky SilkyTe +rrier SkyeTerrier SmoothFoxTerrier SoftCoatedWheatenTerrier SpinoneIt +aliano St.Bernard StaffordshireBullTerrier WireFoxTerrier WirehairedP +ointingGriffon Xoloitzcuintli YorkshireTerrier); my $combinat = Math::Combinatorics->new(count => 10, data => [@n],); while(my @combo = $combinat->next_combination){ my $outrec = join('',@combo); if (length($outrec) == 112) { print DOGS $outrec."\n"; } $counter++; if ($counter == 100000) { print "100000 recs processed"."\n"; $counter = 0; } } close (DOGS);
Thank you all for your help and participation!!!

The code from Neighbour did the trick and scaled to puzzles with larger sets of data without consuming memory.

Replies are listed 'Best First'.
Re: Another word puzzle with too many permutations
by LanX (Saint) on Oct 15, 2013 at 17:47 UTC
    Why don't you just count the letters???

    my %max =( A=>8, B=>2, C=>5, D=>8, E=>7, F=>1, G=>3, H=>9, I=>8, K=>1, L=>9, M=>1, N=>12, O=>13, P=>1, Q=>0, R=>5, S=>6, T=>4, U=>6, V=>0, W=>1, X=>1, Y=>0, Z=>0); my @n = qw(Affenpinscher AfghanHound AiredaleTerrier Akita AlaskanMala +mute AmericanEnglishCoonhound AmericanEskimoDog AmericanFoxhound Base +nji BassetHound Beagle BeardedCollie Beauceron BlackandTanCoonhound B +loodhound BluetickCoonhound BorderCollie BorderTerrier Borzoi BostonT +errier BouvierdesFlandres Boxer BoykinSpaniel Briard Brittany Brussel +sGriffon BullTerrier Bulldog Bullmastiff ChesapeakeBayRetriever Chihu +ahua ChineseCrested ChineseShar-Pei Chinook ChowChow ClumberSpaniel C +ockerSpaniel Collie Curly-CoatedRetriever Dachshund Dalmatian DandieD +inmontTerrier DobermanPinscher DoguedeBordeaux EnglishCockerSpaniel E +nglishFoxhound EnglishSetter EnglishSpringerSpaniel GermanPinscher Ge +rmanShepherdDog GermanShorthairedPointer GermanWirehairedPointer Gian +tSchnauzer GlenofImaalTerrier GoldenRetriever GordonSetter GreatDane +Greyhound Harrier IbizanHound IcelandicSheepdog IrishSetter IrishTerr +ier Kuvasz LabradorRetriever LakelandTerrier LhasaApso Lowchen Maltes +e ManchesterTerrier Mastiff MiniatureBullTerrier MiniaturePinscher Mi +niatureSchnauzer NeapolitanMastiff Newfoundland NorfolkTerrier Peking +ese PembrokeWelshCorgi PharaohHound Plott PortugueseWaterDog Pug Puli + PyreneanShepherd RatTerrier ShibaInu ShihTzu SiberianHusky SilkyTerr +ier SkyeTerrier SmoothFoxTerrier SoftCoatedWheatenTerrier SpinoneItal +iano St.Bernard StaffordshireBullTerrier WireFoxTerrier WirehairedPoi +ntingGriffon Xoloitzcuintli YorkshireTerrier); WORD: for my $word (@n) { my %count=%max; for $c ( split //, uc($word) ) { next WORD if $count{$c}-- <=0; } print "$word\n"; }

    output:

    Cheers Rolf

    ( addicted to the Perl Programming Language)

    update

    If characters like dash or dot are allowed prepend grep {/[A-Z]/} to the for condition, e.g "St.Bernard" will be included too.

      I am very intrigued by this approach and will explore it a bit more. I should end up with a list of the "10" words that use all of the available letters. Also, I checked the resulting output and it doesn't include "Xolitzcuintli" but that's because I forgot to include 1 "Z". It does appear in the output when I set Z to 1. This does leave me with a much smaller list but I would still have to identify the 10 words out of the 72 that create the solution.
        > that use all of the available letters

        please be precise, AUDI has only one D in the example given.

        changing my code to show only exact matches is no problem, just check at the end if all values are 0.

        Cheers Rolf

        ( addicted to the Perl Programming Language)

Re: Another word puzzle with too many permutations
by MidLifeXis (Monsignor) on Oct 15, 2013 at 17:07 UTC

    How about a recursive solution where you stop when you can no longer apply the remaining alphabet to the search space? That way you don't try all combinations, but only those that have a potential of succeeding.

    # pseudocode sub find_some ( @choices, @alphabet ) { $results = []; for $choice ( @choices ) { if ( choice_is_in_alphabet ) { @new_choices = @choices - $choice; @new_alphabet = @alphabet - letters_in( $choice ); push @$results, [ $choice, find_some( @new_choices, @new_a +lphabet ) ]; } } return $results; }

    You will be left with a spanning tree of all good solutions. With a stack you could also remove the recursion.

    Update: Scope error

    Update 2: I have noticed a couple of minor logic errors as well, but they appear to be easy enough to correct.

    --MidLifeXis

        Well, I ran your code using my 4 car example and it gave the right solution. I had mixed results with the dogs but it could be an issue on my side. I am running some more tests but I think this may be the path to what I am looking for. Thank you for taking the time to put a working script together.
      I agree that I need to figure out some way to use the pool of letters in the process to detect when the list is going down a path that doesn't fit the solution. I will think about your pseudocode and see if I can come up with anything. Thank you.
Re: Another word puzzle with too many permutations
by hdb (Monsignor) on Oct 15, 2013 at 20:15 UTC

    Here is a backtracking method which works on sorted data, dogs with rare letters first. (Similar to LanX's proposal.)

    use strict; use warnings; use Data::Dumper; sub score { my ($dog, $letters) = @_; my $sum = 0; $sum += $$letters{$_}//0 for grep {/\w/} split //, $dog; return $sum/length($dog); } my @n = qw(Affenpinscher AfghanHound AiredaleTerrier Akita AlaskanMala +mute AmericanEnglishCoonhound AmericanEskimoDog AmericanFoxhound Base +nji BassetHound Beagle BeardedCollie Beauceron BlackandTanCoonhound B +loodhound BluetickCoonhound BorderCollie BorderTerrier Borzoi BostonT +errier BouvierdesFlandres Boxer BoykinSpaniel Briard Brittany Brussel +sGriffon BullTerrier Bulldog Bullmastiff ChesapeakeBayRetriever Chihu +ahua ChineseCrested ChineseShar-Pei Chinook ChowChow ClumberSpaniel C +ockerSpaniel Collie Curly-CoatedRetriever Dachshund Dalmatian DandieD +inmontTerrier DobermanPinscher DoguedeBordeaux EnglishCockerSpaniel E +nglishFoxhound EnglishSetter EnglishSpringerSpaniel GermanPinscher Ge +rmanShepherdDog GermanShorthairedPointer GermanWirehairedPointer Gian +tSchnauzer GlenofImaalTerrier GoldenRetriever GordonSetter GreatDane +Greyhound Harrier IbizanHound IcelandicSheepdog IrishSetter IrishTerr +ier Kuvasz LabradorRetriever LakelandTerrier LhasaApso Lowchen Maltes +e ManchesterTerrier Mastiff MiniatureBullTerrier MiniaturePinscher Mi +niatureSchnauzer NeapolitanMastiff Newfoundland NorfolkTerrier Peking +ese PembrokeWelshCorgi PharaohHound Plott PortugueseWaterDog Pug Puli + PyreneanShepherd RatTerrier ShibaInu ShihTzu SiberianHusky SilkyTerr +ier SkyeTerrier SmoothFoxTerrier SoftCoatedWheatenTerrier SpinoneItal +iano St.Bernard StaffordshireBullTerrier WireFoxTerrier WirehairedPoi +ntingGriffon Xoloitzcuintli YorkshireTerrier); my $letters = "A(8),B(2),C(5),D(8),E(7),F(1),G(3),H(9),I(8),K(1),L(9), +M(1),N(12),O(13),P(1),Q(0),R(5),S(6),T(4),U(6),V(0),W(1),X(1),Y(0),Z( +1)"; my %letters = map { /(\w)\((\d+)\)/; { $1 => $2 } } split /,/, $letter +s; @n = map { $_->[0] } sort { $a->[1] <=> $b->[1] } map { [ $_, score( $ +_, \%letters ) ] } @n; $letters = join '', sort { length $a <=> length $b } map { /(\w)\((\d+ +)\)/; $2?$1 x $2:'' } split /,/, $letters; sub dogtree { my ( $level, $tree, $letters, @dogs ) = @_; print "$tree\n" and exit unless $letters; DOG: while( my $dog = shift @dogs) { my $remaining = $letters; $remaining =~ s/$_//i or next DOG for grep { /\w/ } split //, $dog +; dogtree( $level+1, "$tree $dog", $remaining, @dogs ); } } dogtree 0, '', $letters, @n;

    Takes 14 seconds on my machine to find one solution.

      That works very well and it's pretty quick too. I just started running it with a much bigger set of data so I will see what happens. Thank you!

        A bit more experimentation with sorting and scoring functions has (sadly) lead me to the conclusion that the fast performance of my posted code is more of an accident rather than due to a great algorithm, i.e. it works very well for this data set but not necessarily on others. So be careful before you bet too much on it...

Re: Another word puzzle with too many permutations
by ramlight (Friar) on Oct 15, 2013 at 17:52 UTC
    I was doing something similar for a game that I was fooling around with at home. I wanted to find all of the possible 3-7 letter English words that could be made up from letters in a single 7-letter word. The algorithm that I used was:

    • get all possible permutations of letters
    • eliminate duplicates
    • look up each possible word in the dictionary to see if it was a real word
    The key to efficient searches was to keep the list in alphabetic order, make a rough guess at the starting point (based on the first letter of the word), and perform a binary search to locate the word.
Re: Another word puzzle with too many permutations
by LanX (Saint) on Oct 15, 2013 at 20:06 UTC
    this primitive branch-and-bound will eventually spit out a solution but it might take days.

    use strict; use warnings; use Data::Dump qw/pp/; use feature qw/say/; my %max =( A=>8, B=>2, C=>5, D=>8, E=>7, F=>1, G=>3, H=>9, I=>8, K=>1, L=>9, M=>1, N=>12, O=>13, P=>1, Q=>0, R=>5, S=>6, T=>4, U=>6, V=>0, W=>1, X=>1, Y=>0, Z=>1); my @n = qw(Affenpinscher AfghanHound AiredaleTerrier Akita AlaskanMala +mute AmericanEnglishCoonhound AmericanEskimoDog AmericanFoxhound Base +nji BassetHound Beagle BeardedCollie Beauceron BlackandTanCoonhound B +loodhound BluetickCoonhound BorderCollie BorderTerrier Borzoi BostonT +errier BouvierdesFlandres Boxer BoykinSpaniel Briard Brittany Brussel +sGriffon BullTerrier Bulldog Bullmastiff ChesapeakeBayRetriever Chihu +ahua ChineseCrested ChineseShar-Pei Chinook ChowChow ClumberSpaniel C +ockerSpaniel Collie Curly-CoatedRetriever Dachshund Dalmatian DandieD +inmontTerrier DobermanPinscher DoguedeBordeaux EnglishCockerSpaniel E +nglishFoxhound EnglishSetter EnglishSpringerSpaniel GermanPinscher Ge +rmanShepherdDog GermanShorthairedPointer GermanWirehairedPointer Gian +tSchnauzer GlenofImaalTerrier GoldenRetriever GordonSetter GreatDane +Greyhound Harrier IbizanHound IcelandicSheepdog IrishSetter IrishTerr +ier Kuvasz LabradorRetriever LakelandTerrier LhasaApso Lowchen Maltes +e ManchesterTerrier Mastiff MiniatureBullTerrier MiniaturePinscher Mi +niatureSchnauzer NeapolitanMastiff Newfoundland NorfolkTerrier Peking +ese PembrokeWelshCorgi PharaohHound Plott PortugueseWaterDog Pug Puli + PyreneanShepherd RatTerrier ShibaInu ShihTzu SiberianHusky SilkyTerr +ier SkyeTerrier SmoothFoxTerrier SoftCoatedWheatenTerrier SpinoneItal +iano St.Bernard StaffordshireBullTerrier WireFoxTerrier WirehairedPoi +ntingGriffon Xoloitzcuintli YorkshireTerrier); #%max=( A=>1,D=>2,F=>1,I=>1,O=>1,R=>1,U=>1); #@n = qw/ HONDA TOYOTA AUDI FORD/; my @path; our $level=0; sub rec { my ($idx0,%max) = @_; #pp $idx0,\%count; local $level=$level+1; WORD: for (my $idx = $idx0; $idx <@n; $idx++){ #say "*** ", my $word = $n[$idx]; my %count=%max; # say " " x $level ,$word; for my $char ( grep {/[A-Z]/} split //, uc($word) ) { next WORD if $count{$char}-- <=0; } push @path, $word; unless (grep { $_ != 0 } values %count ) { print "RESULT: ", join ",",@path,"\n"; # return; goto STOP; } rec($idx+1,%count); pop @path; } } rec(0,%max); STOP:

    a smarter approach would be to count the letters for each word in an initialization phase and to sort them according to exotic letters.

    for instance if Z is 0 there is no point to check words with a Z in the following branch.

    So start to find all allowed combinations for the most rare letter and continuing with the next rare one should greatly improve runtime.

    Unfortunately I have no time to fiddle with entertainment problems, I hope my input helps! =)

    Cheers Rolf

    ( addicted to the Perl Programming Language)

      I do appreciate your input. I am learning a lot by going through the options presented here. Thank you.
Re: Another word puzzle with too many permutations
by Neighbour (Friar) on Oct 16, 2013 at 14:55 UTC
    After getting the right (and only) solution, here's my code. I'm not a concise writer as the others who have posted code here, but it works :) (and it's reasonably fast: 1s until the solution has been found).
      Definitely the fastest solution with the correct result thus far. I changed the puzzle to identify 20 items from a list of 100 and it is running. I did this same thing with a couple of the other solutions and they either run out of memory or haven't completed yet. It doesn't look like your code consumes much memory so I will let it run and see what happens. Thank you.
        WOW!!! The solution kicked out after 23 minutes. I've been running this particular puzzle through a couple of different programs that are still churning after 24 hours. I also appreciate the comments in your script. I am looking forward to studying the code a little closer. Thank you very much!
Re: Another word puzzle with too many permutations
by Laurent_R (Canon) on Oct 15, 2013 at 17:20 UTC

    Maybe you could sort the letters of each word into alphabetic order and use a hash to make the link between sorted words and real words. Then also sort your input letters. I am not sure right now on how to go from there (possibl building a HoH), but I think that "normalizing" the words this way should enable you to do much faster lookup.

      I've thought about that. Either sorting the letters in each word into alphabetical order or keeping a total of each letter being used for each word and building a process to compare against the available pool of letters. It is similar to the word based approach but I keep thinking it might be faster. I'm just not sure where to begin that process yet. Thanks for the reply.

        I think that the best is to start from your list of words. Let's assume for a moment (to simplify the problem) that each letter can come only once both in the list and in each of the words. Check each word, if it has at least one letter not in your list of letters,remove it from the list. At the end, the words still in the list can all be built with with your list of letters. You don't need to test letter combinations, but just to examine each word once. Having sorted your words and your list makes it fast and easy to check whether the letters of words are all in the list of letters.

        With some letters being there more than once, it is slightly more complicated, but not so much. You only need to check for each word if you have the right amount of each letter in the list.

        This can be extremely fast.

        EDIT: I had not seen Rolf's post when I posted mine, but it is essentially the same idea.

Re: Another word puzzle with too many permutations
by hdb (Monsignor) on Oct 15, 2013 at 18:50 UTC

    If I am not mistaken, you only have 111 letters, not 112. Is this just some counting mistake on your side, or could this spoil the solution?

      You are correct. I forgot to put a Z(1).
        I just updated the description to reflect the change. Thank you.
Re: Another word puzzle with too many permutations
by Lennotoecom (Pilgrim) on Oct 16, 2013 at 08:58 UTC
    I got the task as follows: you are given some dictionary, and some string of repeating letters.
    The main goal: find unique, matched by all letters words out of the given dictionary and do it fast).
    Here is my solution
    (it works equally fast on either on the dogs or cars (takes about 0.1s))
    @words = qw/toyota dorf honda audi ford/; %hash=('adduhifoorndatoyota' => ''); $unique = 0; foreach $word (@words){ foreach $key (sort keys %hash){ $length = 0; if($key=~m/\#/){$_ = $'} else {$_= $key} foreach $l (split //, $word){if(/$l/){$_ = "$`$'"; $length++;}} if($length == length($word)){ $newKey = '#'.$_; if($key=~m/\#/){ if($newKey eq '#'){$newKey .= $unique++} $hash{$newKey} = $hash{$key}; $hash{$newKey}.= ' '.$word; } else { $hash{$newKey} = $word; } } } } foreach (sort keys %hash){ print $hash{$_},"\n" if /\#\d+/; } -----OUTPUT----- toyota dorf honda audi toyota honda audi ford
    if you noticed I intentionally put dorf into dictionary,
    so there would be two unique combinations.
    Exactly the same works with the dogs.
    It takes about 0.1 seconds to finish calculations.
    :)
    But requires at least two words being mixed in the string line.

    Update:

    @words = qw/toyota dorf honda audi ford/; %hash=('adduhifoorndatoyota' => ''); $unique = 0; foreach $word (@words){ foreach $key (sort keys %hash){ $length = 0; if($key=~m/\#/){$_ = $'} else {$_= $key} foreach $l (split //, $word){if(/$l/){$_ = "$`$'"; $length++;}} if($length == length($word)){ $newKey = '#'.$_; if($key=~m/\#/){ if($newKey eq '#'){$newKey .= $unique++} $hash{$newKey} = $hash{$key}; $hash{$newKey}.= ' '.$word; } else { $hash{$newKey} = $word; } } } } foreach (sort keys %hash){ print $hash{$_},"\n" if /\#\d+/; } -----OUTPUT----- toyota dorf honda audi toyota honda audi ford

      Please restore your code. On my machine it took 72 minutes to find the correct solution for the dog's dataset using your code. The %hash contains around 5,000,000 entries when it is done.

      It is also very similar to something I wrote for Challenge: 8 Letters, Most Words. This I have not (yet) posted as it already runs for 8 days and has only processed 90% of the input. It also generates a hash with possible solutions and is expected to be of the same order of magnitude.

Re: Another word puzzle with too many permutations
by Neighbour (Friar) on Oct 16, 2013 at 11:20 UTC
    How many solutions are there for the dog names? Right now, my program yields the following:
    [38648] solutions found real 3m36.816s user 3m28.169s sys 0m2.624s

      As my code only found one, would you be so kind to publish two of yours.

        Sure. I'm guessing that I might have some doubles in my solution set, but I'm still working on that (and also trying to get the speed up a little).
        Edit: Hmm, after rereading the OP again, it dawns on me that the given set of letters has to be used completely (instead of partially, which is what I'm currently doing)...back to the drawing board :)
      There should only be 1 solution:
      American English Coonhound Bloodhound Bulldog Chinook Dachshund English Setter Harrier Lhasa Apso Newfoundland Xoloitzcuintli

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1058322]
Approved by davido
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others sharing their wisdom with the Monastery: (3)
As of 2024-04-19 01:41 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found