Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

Faster alternative to Math::Combinatorics

by AppleFritter (Vicar)
on Sep 01, 2017 at 13:20 UTC ( #1198509=perlquestion: print w/replies, xml ) Need Help??

AppleFritter has asked for the wisdom of the Perl Monks concerning the following question:

Oh monks of the round table, who dance whene'er they're able, who dine well here in Camelot and eat ham and jam and spam a lot!

Can someone recommend a faster alternative to Math::Combinatorics, or maybe suggest a better way of doing the following?

I'm trying to generate all multisets (bags) of a specific total "weight" (let's call it w), where each element comes from a given list (of numbers, in this case), and each list element may have multiplicity 0..w in each multiset. In other words, what I'm trying to generate is a list of w-tuples of elements of the given list — but unordered tuples rather than ordered ones.

An example may be instructive. Let's say w is 4, and the list is (0, 2, 3). Then I'd like to get the following multisets:

0,0,0,0 0,0,0,2 0,0,0,3 0,0,2,2 0,0,2,3 0,0,3,3 0,2,2,2 0,2,2,3 0,2,3,3 0,3,3,3 2,2,2,2 2,2,2,3 2,2,3,3 2,3,3,3 3,3,3,3

(The order in which the multisets itself are generated isn't important to me either, BTW. I've only listed them in order for the sake of readability.)

Not wanting to implement this myself, I turned to CPAN and found Math::Combinatorics. This works, but it's fairly slow. Here's a (slightly simplified) excerpt from my code:

#!/usr/bin/perl use Modern::Perl '2015'; use Math::Combinatorics; my $states = 4; foreach my $count (1, 2, 3, 4, 7, 8) { say "count=$count"; my $iter = Math::Combinatorics->new( count => $count, data => [ grep { $_ != 1 } (0 .. ($states - 1)) ], frequency => [($count) x ($states - 1)] ); while(my @states = $iter->next_multiset) { say join(",", @states); } }

This produces the desired output, but it takes almost 90 seconds to run for $states = 4, and much longer for 5 and up:

time perl test.pl count=1 3 0 2 count=2 0,0 0,2 0,3 2,3 2,2 3,3 count=3 3,0,0 3,0,2 3,0,3 3,2,3 3,2,2 3,3,3 0,0,2 0,0,0 0,2,2 2,2,2 count=4 3,2,0,3 3,2,0,2 3,2,0,0 3,2,3,2 3,2,3,3 3,2,2,2 3,0,3,0 3,0,3,3 3,0,0,0 3,3,3,3 2,0,2,2 2,0,2,0 2,0,0,0 2,2,2,2 0,0,0,0 count=7 2,0,2,3,0,0,3 2,0,2,3,0,0,0 2,0,2,3,0,0,2 2,0,2,3,0,3,3 2,0,2,3,0,3,2 2,0,2,3,0,2,2 2,0,2,3,3,3,2 2,0,2,3,3,3,3 2,0,2,3,3,2,2 2,0,2,3,2,2,2 2,0,2,0,0,0,0 2,0,2,0,0,0,2 2,0,2,0,0,2,2 2,0,2,0,2,2,2 2,0,2,2,2,2,2 2,0,3,0,0,3,0 2,0,3,0,0,3,3 2,0,3,0,0,0,0 2,0,3,0,3,3,3 2,0,3,3,3,3,3 2,0,0,0,0,0,0 2,2,3,3,3,2,3 2,2,3,3,3,2,2 2,2,3,3,3,3,3 2,2,3,3,2,2,2 2,2,3,2,2,2,2 2,2,2,2,2,2,2 2,3,3,3,3,3,3 0,3,0,0,3,0,0 0,3,0,0,3,0,3 0,3,0,0,3,3,3 0,3,0,0,0,0,0 0,3,0,3,3,3,3 0,3,3,3,3,3,3 0,0,0,0,0,0,0 3,3,3,3,3,3,3 count=8 3,0,0,0,0,2,2,2 3,0,0,0,0,2,2,3 3,0,0,0,0,2,2,0 3,0,0,0,0,2,3,3 3,0,0,0,0,2,3,0 3,0,0,0,0,2,0,0 3,0,0,0,0,3,3,3 3,0,0,0,0,3,3,0 3,0,0,0,0,3,0,0 3,0,0,0,0,0,0,0 3,0,0,0,2,2,2,2 3,0,0,0,2,2,2,3 3,0,0,0,2,2,3,3 3,0,0,0,2,3,3,3 3,0,0,0,3,3,3,3 3,0,0,2,2,2,2,3 3,0,0,2,2,2,2,2 3,0,0,2,2,2,3,3 3,0,0,2,2,3,3,3 3,0,0,2,3,3,3,3 3,0,0,3,3,3,3,3 3,0,2,2,2,2,3,3 3,0,2,2,2,2,3,2 3,0,2,2,2,2,2,2 3,0,2,2,2,3,3,3 3,0,2,2,3,3,3,3 3,0,2,3,3,3,3,3 3,0,3,3,3,3,3,3 3,2,2,2,2,3,3,3 3,2,2,2,2,3,3,2 3,2,2,2,2,3,2,2 3,2,2,2,2,2,2,2 3,2,2,2,3,3,3,3 3,2,2,3,3,3,3,3 3,2,3,3,3,3,3,3 3,3,3,3,3,3,3,3 0,0,0,0,2,2,2,2 0,0,0,0,2,2,2,0 0,0,0,0,2,2,0,0 0,0,0,0,2,0,0,0 0,0,0,0,0,0,0,0 0,0,0,2,2,2,2,2 0,0,2,2,2,2,2,2 0,2,2,2,2,2,2,2 2,2,2,2,2,2,2,2 real 1m34.525s user 1m32.524s sys 0m0.030s

90 seconds wouldn't be so bad, since this is part of a larger script to generate datafiles that only really needs to be run once (to generate the file). But I'd rather not spend days waiting for it to finish for higher values of $states.

Any suggestions? Like I said, I'd prefer to stick to CPAN, but I'll take what I can get.

Thanks!

Replies are listed 'Best First'.
Re: Faster alternative to Math::Combinatorics
by BrowserUk (Pope) on Sep 01, 2017 at 13:42 UTC

    I think this is doing the same thing in

    C:\test>1198509.pl >nul Took 0.064547 seconds
    with output redirected; or if not:
    C:\test>1198509.pl 0 2 3 0 0 0 2 0 3 2 0 2 2 2 3 3 0 ... 3 3 3 3 3 3 2 3 3 3 3 3 3 3 3 0 3 3 3 3 3 3 3 2 3 3 3 3 3 3 3 3 Took 6.103125 seconds
    #! perl -slw use strict; use Time::HiRes qw[ time ]; use Algorithm::Combinatorics qw[ variations_with_repetition ]; my @data = ( 0, 2, 3 ); my $start = time; for my $k ( 1 .. 8 ) { my $iter = variations_with_repetition( \@data, $k ); print "@$_" while $_ = $iter->next; } printf STDERR "Took %f seconds\n", time() - $start;

    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority". The enemy of (IT) success is complexity.
    In the absence of evidence, opinion is indistinguishable from prejudice. Suck that fhit

      Funny that not redirecting the output makes such a big difference on your machine. For me, your code takes ~0.06 seconds to run when redirecting to /dev/nul, and ~0.11 seconds if not.

      Be that as it may, thanks for the pointer to Algorithm::Combinatorics and the code snippet, this looks like a very useful module! And (redirecting to /dev/nul, again) I'm getting running times of ~0.5s, ~2.9s, ~12.1s, ~40.6s for @data sizes of 4, ..., 7, which is very reasonable.

      EDIT: Of course, what I was actually looking for was multisets, not ordered tuples (did you read my post?), but fortunately Algorithm::Combinatorics also offers a combinations_with_repetition function for that. Funny that I completely missed this module when looking at CPAN earlier today, too.)

Re: Faster alternative to Math::Combinatorics
by tybalt89 (Parson) on Sep 01, 2017 at 14:04 UTC

    Depending on how many elements you have :)

    #!/usr/bin/perl -l # http://perlmonks.org/?node_id=1198509 use strict; use warnings; my @elements = (0, 2, 3); my ($first, $last) = @elements[0, -1]; my %next; @next{@elements} = @elements[1 .. @elements]; for my $count (1,2,3,4,7,8) { print "count=$count"; $_ = $first x $count; 1 while print(join ',', /./g), s/([^$last])($last*)$/ $next{$1} x length $& /e; }

    Outputs:

    count=1 0 2 3 count=2 0,0 0,2 0,3 2,2 2,3 3,3 count=3 0,0,0 0,0,2 0,0,3 0,2,2 0,2,3 0,3,3 2,2,2 2,2,3 2,3,3 3,3,3 count=4 0,0,0,0 0,0,0,2 0,0,0,3 0,0,2,2 0,0,2,3 0,0,3,3 0,2,2,2 0,2,2,3 0,2,3,3 0,3,3,3 2,2,2,2 2,2,2,3 2,2,3,3 2,3,3,3 3,3,3,3 count=7 0,0,0,0,0,0,0 0,0,0,0,0,0,2 0,0,0,0,0,0,3 0,0,0,0,0,2,2 0,0,0,0,0,2,3 0,0,0,0,0,3,3 0,0,0,0,2,2,2 0,0,0,0,2,2,3 0,0,0,0,2,3,3 0,0,0,0,3,3,3 0,0,0,2,2,2,2 0,0,0,2,2,2,3 0,0,0,2,2,3,3 0,0,0,2,3,3,3 0,0,0,3,3,3,3 0,0,2,2,2,2,2 0,0,2,2,2,2,3 0,0,2,2,2,3,3 0,0,2,2,3,3,3 0,0,2,3,3,3,3 0,0,3,3,3,3,3 0,2,2,2,2,2,2 0,2,2,2,2,2,3 0,2,2,2,2,3,3 0,2,2,2,3,3,3 0,2,2,3,3,3,3 0,2,3,3,3,3,3 0,3,3,3,3,3,3 2,2,2,2,2,2,2 2,2,2,2,2,2,3 2,2,2,2,2,3,3 2,2,2,2,3,3,3 2,2,2,3,3,3,3 2,2,3,3,3,3,3 2,3,3,3,3,3,3 3,3,3,3,3,3,3 count=8 0,0,0,0,0,0,0,0 0,0,0,0,0,0,0,2 0,0,0,0,0,0,0,3 0,0,0,0,0,0,2,2 0,0,0,0,0,0,2,3 0,0,0,0,0,0,3,3 0,0,0,0,0,2,2,2 0,0,0,0,0,2,2,3 0,0,0,0,0,2,3,3 0,0,0,0,0,3,3,3 0,0,0,0,2,2,2,2 0,0,0,0,2,2,2,3 0,0,0,0,2,2,3,3 0,0,0,0,2,3,3,3 0,0,0,0,3,3,3,3 0,0,0,2,2,2,2,2 0,0,0,2,2,2,2,3 0,0,0,2,2,2,3,3 0,0,0,2,2,3,3,3 0,0,0,2,3,3,3,3 0,0,0,3,3,3,3,3 0,0,2,2,2,2,2,2 0,0,2,2,2,2,2,3 0,0,2,2,2,2,3,3 0,0,2,2,2,3,3,3 0,0,2,2,3,3,3,3 0,0,2,3,3,3,3,3 0,0,3,3,3,3,3,3 0,2,2,2,2,2,2,2 0,2,2,2,2,2,2,3 0,2,2,2,2,2,3,3 0,2,2,2,2,3,3,3 0,2,2,2,3,3,3,3 0,2,2,3,3,3,3,3 0,2,3,3,3,3,3,3 0,3,3,3,3,3,3,3 2,2,2,2,2,2,2,2 2,2,2,2,2,2,2,3 2,2,2,2,2,2,3,3 2,2,2,2,2,3,3,3 2,2,2,2,3,3,3,3 2,2,2,3,3,3,3,3 2,2,3,3,3,3,3,3 2,3,3,3,3,3,3,3 3,3,3,3,3,3,3,3 real 0m0.023s user 0m0.018s sys 0m0.009s

      Depending on how many elements you have :)

      Good question! The number could be arbitrarily high in theory, in practice I doubt the number will leave the two-digit range¹. (In case anyone's wondering, BTW, this is related to multistate cellular automata; w is a neighborhood count, and the list I mentioned is a list of (some) states of the CA in question.)

      I'll have to take a long look at your code to really understand it, but wow, those timings look marvellous. Thank you! If this ends up working out I'll definitely owe you a beer.

      Footnotes:

      1. Edited to add: in fact, since the amount of data generated will increase exponentially with the number of CA states, I think realistically, it won't even leave the single-digit range.

        Here's a version of the same algorithm working on array elements instead of characters. (Messier, isn't it?)

        With 10 elements (states) it takes 1.5 seconds on my machine, but I think most of the time is taken by the printing.

        How many neighbors can you have? (for code testing purposes.)

        #!/usr/bin/perl -l # http://perlmonks.org/?node_id=1198509 use strict; use warnings; my @elements = (0, 2, 3); @elements = 0..9; my ($first, $last) = @elements[0, -1]; my %next; @next{@elements} = @elements[1 .. $#elements]; for my $count (1,2,3,4,7,8) { print "count=$count"; my @set = ($first) x $count; local $, = ','; while(1) { print @set; my $i = $#set; $i-- while $i >= 0 and $set[$i] eq $last; $i < 0 and last; @set[$i .. $#set] = ($next{$set[$i]}) x ( @set - $i); } }
Re: Faster alternative to Math::Combinatorics
by talexb (Canon) on Sep 01, 2017 at 13:38 UTC

    I wrote my own Permutation module in this post .. but it looks like we may have different ideas about a permutation. Your example doesn't list all of the permutations of (0, 2, 3) in an array of four elements .. it just has arrays where the elements exist in ascending order. For example, (0, 0, 2, 0) isn't in your list, and I can't tell if that omission is intentional or not.

    In any case, I'd write code to do the deed myself, and perhaps benchmark it against the module you've chosen -- it could be that your code is faster because it has less overhead. If that quick test fails, you may have to put on your thinking cap and simplify the algorithm. Or allocate a couple of hours to generate the test cases.

    Alex / talexb / Toronto

    Thanks PJ. We owe you so much. Groklaw -- RIP -- 2003 to 2013.

      I wrote my own Permutation module in this post .. but it looks like we may have different ideas about a permutation. Your example doesn't list all of the permutations of (0, 2, 3) in an array of four elements .. it just has arrays where the elements exist in ascending order. For example, (0, 0, 2, 0) isn't in your list, and I can't tell if that omission is intentional or not.

      It's intentional — I'm only looking for unordered tuples. Think of it as simultaneously drawing w (e.g. 3) balls from an urn containing at least w balls each marked n for any element of the underlying list (e.g. (0, 2, 3)). Generating all possible permutations would be quite easy, otherwise.

      I actually briefly entertained the thought of generating all possible tuples and then removing ones that are in the same equivalence class as previously-seen ones, but for larger w and longer lists, this would take a fair amount of time and memory. w won't go beyond 8, but the lists might be arbitrarily long, at least in theory.

      (Without having checked I actually have a gut feeling that this is what Math::Combinatorics might be doing under the hood — it would explain why it's so slow, and why the first multisets I draw from it appear much faster than the later ones!)

      In any case, I'd write code to do the deed myself, and perhaps benchmark it against the module you've chosen -- it could be that your code is faster because it has less overhead. If that quick test fails, you may have to put on your thinking cap and simplify the algorithm. Or allocate a couple of hours to generate the test cases.

      Yeah, that's definitely an option. But I'm lazy (it's one of the chief virtues of a programmer!), and therefore prefer to, in order:

      1. Find CPAN modules that do what I want;
      2. Get the brothers and sister on Perlmonks to write code for me... oops, did I say that out loud? Of course, I actually mean:
      3. Get ideas, pointers to known standard algorithms etc. from Perlmonks;
      4. Solve the (mathematical) problem myself and then write my own code.

      Thanks for taking the time to reply, BTW!

        I'm only looking for unordered tuples
        The mathematical terminology is kind of backwards from the computer science terminology here. You mean that the order of the tuple doesn't matter, and you've put your examples in numerically ascending sequence for neatness. A programmer looks at that and says, "those tuples are ordered."
Re: Faster alternative to Math::Combinatorics
by Laurent_R (Canon) on Sep 02, 2017 at 11:53 UTC
    Hi AppleFritter,

    further to my previous post (http://www.perlmonks.org/?node_id=1198575), this is another version of my program which seem to produce results more in line with what you seem to be expecting (assuming I have understood what you're expecting). Or, at least, the number of output lines seems to be what you're looking for.

    use strict; use warnings; my @list = (0, 2, 3); for my $w (1, 2, 3, 4, 7, 8) { print "count = $w\n"; make_sets2($w, "", @list); } sub make_sets2 { my $weight = shift; my $temp_result = shift; print "$temp_result\n" and return if $weight <= 0; while (@_) { my $item = shift; make_sets2( $weight -1, "$temp_result$item, ", ($item, @_)); } }
    Now, the (abbreviated) result looks like this: Since the output is much smaller, redirecting the output does not make a very significant performance difference:
    $ time perl multisets.pl > /dev/null real 0m0.043s user 0m0.000s sys 0m0.031s
    But, frankly, I still do not understand why you seem to be looking for a subset of all the possible combinations.
Re: Faster alternative to Math::Combinatorics
by Laurent_R (Canon) on Sep 02, 2017 at 11:40 UTC
    Hi AppleFritter,

    you've already received good solutions for your problem, but I thought it might be beneficial to provide a basic algorithm to solve it. This is using recursion.

    (I know I'm coming a bit late, but I did not have time yesterday to code and test anything.)

    Anyway, this is my (first) solution:

    use strict; use warnings; my @list = (0, 2, 3); for my $w (1, 2, 3, 4, 7, 8) { print "count = $w\n"; make_sets1($w, ""); } sub make_sets1 { my ($weight, $temp_result) = @_; print "$temp_result\n" and return if $weight <= 0; for my $item (@list) { make_sets1( $weight -1, "$temp_result$item, "); } }
    If I run this, I get the following (abbreviated) output: Or, redirecting the output:
    $ time perl multisets.pl > /dev/null real 0m0.057s user 0m0.015s sys 0m0.015s
    So, this is pretty fast.

    Now, there is a slight problem. This program is not producing the same results as those you appear to be expecting (I am not talking about the different formatting, but about the number and list of results).

    For example, looking only at the weight of 3, I get this:

    count = 3 0, 0, 0, 0, 0, 2, 0, 0, 3, 0, 2, 0, 0, 2, 2, 0, 2, 3, 0, 3, 0, 0, 3, 2, 0, 3, 3, 2, 0, 0, 2, 0, 2, 2, 0, 3, 2, 2, 0, 2, 2, 2, 2, 2, 3, 2, 3, 0, 2, 3, 2, 2, 3, 3, 3, 0, 0, 3, 0, 2, 3, 0, 3, 3, 2, 0, 3, 2, 2, 3, 2, 3, 3, 3, 0, 3, 3, 2, 3, 3, 3,
    whereas you seem to be expecting:
    count=3 3,0,0 3,0,2 3,0,3 3,2,3 3,2,2 3,3,3 0,0,2 0,0,0 0,2,2 2,2,2
    I get 27 combinations and you seem to be expecting only 10. I don't understand, for example, why you have only one result starting with 2, why you don't have (2,0,0),  (2,0,2),  (2,0,3),   (2,2,0), ... and so on. I would expect you to be willing to have all the possible combinations with repetitions (i.e. 3 ** 3 = 27 combinations), but that doesn't appear to be what you're after. Or is there an error in your expected result? Or did I miss part of your explanation?

    I'll make another post with a modified program which appears to produce something closer to what you seem to be expecting.

    Update: Modified the list of "missing" combinations listed just above to reflect the input data (O, 2, 3 and not 0, 1, 2).

      More answers and solutions are always good, so thanks a lot for your effort!

      The reason you're getting 27 combinations is that you're producing ordered tuples, whereas what I'm looking for is multisets (which are by definition unordered). I'll quote what I wrote in my post:

      I'm trying to generate all multisets (bags) of a specific total "weight" (let's call it w), where each element comes from a given list (of numbers, in this case), and each list element may have multiplicity 0..w in each multiset. In other words, what I'm trying to generate is a list of w-tuples of elements of the given list but unordered tuples rather than ordered ones.

      What this means is that:

      • Results that contain the same numbers a different amount of times are distinguished: "2,3,3" is not the same as "2,2,3".
      • Results that are merely reordered are NOT distinguished: "2,3,3" is the same as "3,2,3" and "3,3,2".

      Wikipedia has more on multisets: Multiset.

      Like I said in my reply to tybalt89, the underlying problem I was trying to solve here¹ is related to a certain class of cellular automata with multiple states. I needed to generate all the possible combinations of states a certain subset of a given cell's immediate neighborhood could be in — but I was only interested in outer-totalistic CAs where the specific alignment of those neighboring cells didn't matter. Hence: it makes a difference whether of three cells I'm considering, one is in state 2 and two in state 3, or two in state 2 and one in state 3; but it doesn't make a difference specifically where in the center cell's neighborhood those neighboring cells are. Multisets / multicombinations were a natural choice for representing that.

      TL;DR — thanks again, I appreciate all the good replies, tips, pointers and suggestions I got!

      Footnotes:

      1. And which I have solved; it turns out that Algorithm::Combinatorics not only has a convenient function (combinations_with_repetitions) to generate just what I need, that function is also blindingly fast. See my reply to BrowserUk who also misunderstood (or didn't read carefully, one imagines) my question.
        OK, AppleFritter, thanks for your answer.

        Thinking more about it, what I get with this program is actually 27 permutations, not 27 combinations. But English is not my mother tongue (and, as far as I can say, probably also not yours), so I probably got a bit confused about it. And I did not know anything about multisets before (or had forgotten everything about it).

        Thanks a lot for the clarification.

        I haven't checked thoroughly, but it seems that the program I have suggested in my other post (http://www.perlmonks.org/?node_id=1198576) probably does what you want.

        Update: And, BTW, my code in the other post runs in less than 1/20th of a second, so it is also fairly fast.

Re: Faster alternative to Math::Combinatorics
by Anonymous Monk on Sep 01, 2017 at 14:22 UTC
    Great, now I'm going to have Pie Iesu Domine running through my head all day. What you are looking for is "combinations with repetition", and Algorithm::Combinatorics has it.

      Great, now I'm going to have Pie Iesu Domine running through my head all day.

      Then my work here is done! ;) Thank you for the pointer, I'll look into this module!

      P.S. ...dona eis requiem! *bonk*

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1198509]
Approved by talexb
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chilling in the Monastery: (5)
As of 2020-06-02 18:48 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Do you really want to know if there is extraterrestrial life?



    Results (19 votes). Check out past polls.

    Notices?