Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

possible combinations in sequence

by ruzam (Curate)
on Jun 08, 2006 at 22:45 UTC ( #554374=snippet: print w/ replies, xml ) Need Help??

Description: I have a text string which contains a sequence of words. For example:
horse:cow:dog:cat
I needed to generate every combination of words (missing or not), but each combination must maintain the original sequence. For example a partial result of the above:
cow, cat, horse:cow, horse:cat, cow:dog:cat, etc
(cow:horse would be considered out of sequence)
Here's how I handled it. Perhaps there's a faster way?
use strict;
use warnings;

# initial source in sequence order
my $source = 'horse:cow:dog:cat';

# hash used to store results (weed out dups)
my %hash;

function(\%hash, $source);

# and now the results
foreach (keys %hash) {
    print "$_\n";
}

# generate array of combinations
# (hash keys are the results)
sub function {
    my $hash_ref = shift;
    my $source = shift;

    # skip it if we've been here
    return if $hash_ref->{$source};

    $hash_ref->{$source}++;

    # remove one of each part from the whole
    if ((my @parts = split(/:/, $source)) > 1) {
        for (my $i = 0; $i < @parts; $i++) {
            my @parts_copy = split(/:/, $source);
            splice(@parts_copy, $i, 1);
            function($hash_ref, join(':',@parts_copy));
        }
    }
    return;
}

Comment on possible combinations in sequence
Download Code
Re: possible combinations in sequence
by ikegami (Pope) on Jun 08, 2006 at 23:33 UTC

    An alternative:

    use Algorithm::Loops qw( NestedLoops ); my $source = 'horse:cow:dog:cat'; my @parts = split(/:/, $source); my $iter = NestedLoops( [ [ 0..$#parts ], ( sub { [ $_+1..$#parts ] } ) x $#parts, ], { OnlyWhen => 1 }, ); my @s; print(join(':', map $parts[$_], @s), "\n") while @s = $iter->();

    Update: Even better:

    my $source = 'horse:cow:dog:cat'; my @parts = split(/:/, $source); for my $comb (1..2**@parts-1) { my $s = join ':', map $parts[$_], grep $comb & (1<<$_), 0..$#parts; print("$s\n"); }

    Update: Neat, and even faster:

    my $source = 'horse:cow:dog:cat'; local $_ = ":$source:"; my $parts = tr/:/:/ - 1; my $re = '(?{ "" })' . '(:[^:]*)(?=:)(?{ $^R . $^N })' . '(?:.*(:[^:]*)(?=:)(?{ $^R . $^N })' x ($parts-1) . ')?' x ($parts-1) . '(?{ push @rv, substr($^R, 1) })' . '(?!)'; { use re 'eval'; $re = qr/$re/; } local our @rv; /$re/; print "$_\n" foreach @rv;
Re: possible combinations in sequence
by rhesa (Vicar) on Jun 08, 2006 at 23:41 UTC
    My basic idea is to map the array indices to bits in a binary number. If a bit is on, you take that element out of the source array. For example:
    0 = 0b0000 --> {nothing} 3 = 0b0011 --> 'horse:cow' 13 = 0b1101 --> 'horse:dog:cat'
    The algorithm then simply becomes a loop over 1 .. 2**@kw -1, testing the bits for each number.

    Here's my first implementation of it. It's probably not as efficient as possible yet.

    sub rhesa { # initial source in sequence order my $source = 'horse:cow:dog:cat'; my @kw = split /:/, $source; my @res; for my $i( 1 .. 2**@kw - 1 ) { my @ar; my $t; while( $i > 0 ) { push @ar, $kw[$t] if $i & 1; $i >>= 1; $t++; } push @res, join ':', @ar; } return @res; }
    I'm a bit irritated with the number of temporary variables, but I can't think of anything prettier just now. Hope it helps :)

    BTW, a simple Benchmark comparison showed a 200% speed increase over your version.

      rhesa, I like your version.

      Here's my somewhat golfed variation:

      sub rhesa2 { my $source = shift; my @kw = split /:/, $source; map { my (@ar, $t); do { ($_ & 1) and push @ar, $kw[$t]; $t++; } while ($_ >>= 1); join ':', @ar } ( 1 .. 2**@kw - 1 ) } my @res = rhesa2('horse:cow:dog:cat');

      s''(q.S:$/9=(T1';s;(..)(..);$..=substr+crypt($1,$2),2,3;eg;print$..$/

      Yours is also faster (albeit only 7% faster) than my grep approach, but my regexp approach is 13% faster than yours.

      Rate ikegami1 ruzam ikegami2 rhesa ikegami3 ikegami1 2381/s -- -19% -66% -68% -72% ruzam 2944/s 24% -- -58% -61% -65% ikegami2 7072/s 197% 140% -- -5% -16% rhesa 7478/s 214% 154% 6% -- -11% ikegami3 8420/s 254% 186% 19% 13% --
        Your use of grep is adorable! I think it's by far the most readable version, with the clearest exposition of intent.

        The regexp on the other hand... the approach certainly is neat, i'll give you that ;) ++ for speed and ingenuity, but ouch does my head spin!

        You guys are so beyond awesome! ikegami3 is nothing short of brilliance :) ++ to ikegami. rhesa, and liverpole.
        Thanks to ikegami's benchmark, I ran my own benchmarks. I excluded ikegami1 simply because of the 'Algorithm::Loops' dependency. Then just for personal interest, I copied ikegami3 and replaced the '$parts - 1' parts:
        sub ikegami3x { local $_ = ":$_[0]:"; my $parts = tr/:/:/ - 2; # take 2 here instead of -1 later my $re = '(?{ "" })' . '(:[^:]*)(?=:)(?{ $^R . $^N })' . '(?:.*(:[^:]*)(?=:)(?{ $^R . $^N })' x $parts . ')?' x $parts . '(?{ push @rv, substr($^R, 1) })' . '(?!)'; { use re 'eval'; $re = qr/$re/; } local our @rv; /$re/; return @rv; }

        I also included rhesa2 with a slight change to eliminate 'uninitialized' warnings
        sub rhesa2 { my @kw = split /:/, $_[0]; map { my @ar; my $t = 0; # initialize $t do { ($_ & 1) and push @ar, $kw[$t]; $t++; } while ($_ >>= 1); join ':', @ar; } ( 1 .. 2**@kw - 1 ); }

        I evened up all test functions to use $_[0], and finally I ran tests against different 'word counts' of the source (in actual use, $source will contain varying numbers of words).
        These are my benchmark results (I've run this several times to come up with more or less the same results)
        source: horse:cow:dog:cat Rate ruzam ikegami3 ikegami2 ikegami3x rhesa2 rhesa ruzam 4620/s -- -61% -61% -61% -66% -66% ikegami3 11764/s 155% -- -0% -1% -12% -14% ikegami2 11819/s 156% 0% -- -1% -12% -13% ikegami3x 11935/s 158% 1% 1% -- -11% -13% rhesa2 13444/s 191% 14% 14% 13% -- -2% rhesa 13657/s 196% 16% 16% 14% 2% -- source: horse Rate ikegami3 ikegami3x ruzam ikegami2 rhesa rhesa2 ikegami3 40841/s -- -1% -37% -62% -64% -72% ikegami3x 41226/s 1% -- -37% -62% -64% -72% ruzam 65317/s 60% 58% -- -39% -43% -55% ikegami2 107178/s 162% 160% 64% -- -6% -26% rhesa 114470/s 180% 178% 75% 7% -- -21% rhesa2 145232/s 256% 252% 122% 36% 27% -- source: horse:cat Rate ruzam ikegami3 ikegami3x ikegami2 rhesa rhesa2 ruzam 26853/s -- -2% -7% -48% -53% -58% ikegami3 27324/s 2% -- -5% -47% -52% -58% ikegami3x 28732/s 7% 5% -- -45% -50% -55% ikegami2 51965/s 94% 90% 81% -- -9% -19% rhesa 57233/s 113% 109% 99% 10% -- -11% rhesa2 64472/s 140% 136% 124% 24% 13% -- source: horse:cow:cat Rate ruzam ikegami3 ikegami3x ikegami2 rhesa rhesa2 ruzam 10772/s -- -41% -42% -58% -61% -64% ikegami3 18305/s 70% -- -1% -28% -33% -38% ikegami3x 18436/s 71% 1% -- -27% -33% -38% ikegami2 25353/s 135% 39% 38% -- -7% -15% rhesa 27363/s 154% 49% 48% 8% -- -8% rhesa2 29753/s 176% 63% 61% 17% 9% -- source: horse:cow:dog:cat:mouse Rate ruzam rhesa ikegami2 rhesa2 ikegami3x ikegami3 ruzam 1632/s -- -67% -68% -73% -75% -75% rhesa 5021/s 208% -- -3% -17% -24% -24% ikegami2 5159/s 216% 3% -- -14% -22% -22% rhesa2 6023/s 269% 20% 17% -- -9% -9% ikegami3x 6614/s 305% 32% 28% 10% -- -0% ikegami3 6634/s 307% 32% 29% 10% 0% --

        I can't nail down the box so the results can fluctuate quite a bit from test to test, but overall these seem to be consistent. rhesa2 takes the lead up to 4 words, ikegami3 takes over at 5 words (and even more so at 6 words). rhesa2 rocks in the low word counts, where as ikegami3 seems to have more overhead. In my real world use, the word count is usually 4 or less (4 was just a good example size), so rhesa2 wins and replaceses my original ruzam.
        ikegami,
        I was tired last night when I found this thread but I wanted to point out Finding all Combinations. I would be interested in seeing how the ones that produce the correct order compare (specifically mine).

        Cheers - L~R

      My basic idea is to map the array indices to bits in a binary number. If a bit is on, you take that element out of the source array.

      This is precisely the same approach that I used in Data::PowerSet, for indeed, what the OP is looking for is the power set of the list.

      • another intruder with the mooring in the heart of the Perl

Re: possible combinations in sequence
by roboticus (Canon) on Jun 09, 2006 at 04:38 UTC
    ruzam:

    Here's my stab at it. I took out the hash because my method doesn't generate duplicates. Instead I just return the list of results:

    #!/usr/bin/perl -w use strict; use warnings; # initial source in sequence order my $source = 'horse:cow:dog:cat'; function($source); # and now the results foreach (function($source)) { print "$_\n"; } # generate array of combinations sub function { my @t = split /:/, shift; my @res=(shift @t); for my $i (@t) { @res=($i, @res, map{$_.':'.$i} @res); } return @res; }
    UPDATE: I didn't benchmark it because I've never used that module before. (I'll have to go install it and read up on it.) But I submitted it because I suspect that generating the strings from components might be faster than removing chunks. Any benchmarking ninjas out there wanna help me out?

    --roboticus

      Whollopin Websnappers!
      Taking a decisive lead, and proving yet again that simplicity is beauty: roboticus++
      source: horse:cow:dog:cat Rate ruzam ikegami3 rhesa rhesa2 roboticus ruzam 4655/s -- -62% -66% -66% -80% ikegami3 12238/s 163% -- -10% -11% -47% rhesa 13612/s 192% 11% -- -1% -41% rhesa2 13742/s 195% 12% 1% -- -40% roboticus 22986/s 394% 88% 69% 67% -- source: horse Rate ikegami3 ruzam rhesa rhesa2 roboticus ikegami3 41518/s -- -36% -64% -72% -84% ruzam 64752/s 56% -- -43% -56% -75% rhesa 113875/s 174% 76% -- -22% -56% rhesa2 146152/s 252% 126% 28% -- -43% roboticus 257311/s 520% 297% 126% 76% -- source: horse:cat Rate ruzam ikegami3 rhesa rhesa2 roboticus ruzam 27050/s -- -3% -53% -59% -72% ikegami3 27836/s 3% -- -52% -57% -71% rhesa 57415/s 112% 106% -- -12% -40% rhesa2 65371/s 142% 135% 14% -- -32% roboticus 96089/s 255% 245% 67% 47% -- source: horse:cow:cat Rate ruzam ikegami3 rhesa rhesa2 roboticus ruzam 10920/s -- -43% -61% -64% -76% ikegami3 19183/s 76% -- -32% -37% -59% rhesa 28259/s 159% 47% -- -7% -39% rhesa2 30244/s 177% 58% 7% -- -35% roboticus 46412/s 325% 142% 64% 53% -- source: horse:cow:dog:cat:mouse Rate ruzam rhesa2 rhesa ikegami3 roboticus ruzam 1855/s -- -67% -68% -72% -82% rhesa2 5676/s 206% -- -2% -14% -45% rhesa 5781/s 212% 2% -- -13% -44% ikegami3 6614/s 257% 17% 14% -- -36% roboticus 10353/s 458% 82% 79% 57% -- source: horse:cow:dog:cat:rat:mouse Rate ruzam rhesa2 rhesa ikegami3 roboticus ruzam 799/s -- -70% -70% -77% -85% rhesa2 2659/s 233% -- -0% -24% -50% rhesa 2667/s 234% 0% -- -24% -50% ikegami3 3521/s 340% 32% 32% -- -34% roboticus 5371/s 572% 102% 101% 53% --

      If you want a huge speed boost, replace
      @rv=($i, @rv, map{$_.':'.$i} @rv);
      with
      push @rv, $i, map{$_.':'.$i} @rv;

      By the way, I found confusing the use of $i for something which is not an index and not even numerical. $part would have worked fine, since it's an element of @parts. Well, it was named @parts before you renamed to worthless @t. Nice algorithm, but shoddy code.

        ikegami:

        Thanks for the tip on the speed boost. Once I get the benchmarking stuff installed I'll play with it. While I do like your suggestion, I prefer the order that my method generates--all one-word combinations first, then the two-word combinations, etc.)

        Re: shoddy code. Yeah, I guess so, consider me properly admonished. ++ for calling me on it and keeping me honest. When I thunk up the technique to use, I just erased the original function body and started whacking away at it. I didn't think to clarify things by using better variable names. (Of course, I just came off of a little golfing trip so my head was in "trim keystrokes" mode.</lame_excuse_mode>) Now, I guess the proper thing to do is to clean it up a little and insert your suggestion, so here goes:

        <pedagogical_mode>

        sub function { my @parts = split /:/, shift; # Null is the complete list of combinations for # an empty word list my @combinations=(); # Sequentially (recursively with tail recursion # removed) rebuild the combination list adding one # new word each iteration for my $new_word (@parts) { # Given a complete set of combinations for a # given list of words, we can add a new word to # the list and generate all new valid combinations # by concatenating to the original list: push @combinations, # the new word (a single word is a valid # combination) $new_word, # and the original list with the new word # glommed onto the end of each member map {$_.':'.$new_word} @combinations ; } return @combinations; }
        </pedagogical_mode>

        --roboticus

Re: possible combinations in sequence
by QM (Vicar) on Jun 13, 2006 at 23:40 UTC
    And for completeness, the glob solution, which spends more code fixing up the output than actually generating the results:
    sub qm { my ($glob) = @_; my @rv; $glob =~ s/(\w+)/{$1,}/g; for my $combo (glob($glob)) { $combo =~ s/^:+//; next unless length($combo); push @rv, join ':', split /:+/, $combo; } return @rv; }
    and is horribly slow as well.

    BTW, it's interesting to note the change in benchmark results when the input is a long list of null strings:

    $source = ':'x20;

    -QM
    --
    Quantum Mechanics: The dreams stuff is made of

Back to Snippets Section

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: snippet [id://554374]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others contemplating the Monastery: (7)
As of 2014-12-19 22:55 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (94 votes), past polls