good chemistry is complicated, and a little bit messy LW 

PerlMonks 
Re: Sampling from Combination Spaceby blokhead (Monsignor) 
on Jul 18, 2005 at 05:48 UTC ( #475645=note: print w/ replies, xml )  Need Help?? 
For these kinds of problems, I always consult my favorite online book, Combinatorial Generation by Frank Ruskey. On page 87 of the PDF, it derives convenient ranking and unranking algorithms for combinations. Here they are in Perl: The rank function takes a combination of size k from {1 .. N} into an integer in {0 .. C(N,k)1}. The unrank function goes the other direction. In fact, N does not even need to be known to these algorithms (and k only to the unrank algorithm). Since it makes a lot of use of computing binomial coefficients, I've used Math::Pari's version and also memoized it for speed. However, I haven't gotten the algorithm to work with large Pari numbers (I can't remember how to work with Pari objects, and it's getting late). Now, assuming you can get these algorithms to work on the large numbers involved, and you can sample from the appropriately large range of integers, you can perform any sampling method you want on {0 .. C(N,k)1} and apply it to the set of combinations. .... Having directly addressed your question, let me say that I find it hard to imagine that the naïve sampling approach (repeatedly choosing a uniformly random ksubset) would significantly skew any statistical properties. The sample space is so large and the chance of collision is so small. Are you absolutely sure that all this effort to avoid repeated samples isn't overkill? blokhead
In Section
Seekers of Perl Wisdom

