good chemistry is complicated,and a little bit messy -LW PerlMonks

### Algorithm to convert combinations to bitstring

by Limbic~Region (Chancellor)
 on Oct 18, 2006 at 16:23 UTC Need Help??

Limbic~Region has asked for the wisdom of the Perl Monks concerning the following question:

All,
The formula to determine the number of combinations (C) from a group of items (N) when choosing a fixed number of items at a time (K) is C = N! / K! * (N - K)!

So if I had a list of 5 items and I chose 2 at a time, I would have 10 combinations 120 / 2 * 6

```ABCDE = AB, AC, AD, AE, BC, BD, BE, CD, CE, DE

Assuming:

• 1. The total number of items N is known
• 2. The number at a time K is known
• 3. The number of combinations C is known
• 4. All items of N are unique
• 5. All items of N are in ascending order
• 6. Combinations will be generated as shown above (or the reverse as shown above)
It should be possible to use a bitstring the size of C to represent each combination. My question is what algorithm could I use to convert say 'CD' to 8 (or 7 if 0 based) and flip that bit. The converse algorithm (taking the bit and obtaining the combination) is not needed but feel free to have fun.

Cheers - L~R

Replies are listed 'Best First'.
Re: Algorithm to convert combinations to bitstring
by blokhead (Monsignor) on Oct 18, 2006 at 16:56 UTC
When you have a set of objects S, and you want to build efficient mappings between S and {1, ..., |S|} that preserve some ordering over S (say, lexicographic ordering on S corresponds to ascending order in the integers), you're looking for ranking & unranking algorithms.

You can find ranking/unranking algorithms for combinations in this node: Re: Sampling from Combination Space. It's just an implementation of algorithms I found in a textbook (source given in that node). I believe the ranking is with respect to lexicographic ordering, but I don't remember for sure. (In that thread, the ordering of the combinations was irrelevant) Also the textbook probably has implementation details (like running time complexity). It may not be online anymore, but if you want I can send you a copy of my PDF.

Hopefully this does what you need, or at least gives you a place to start. You'll certainly have to adapt it to use alphabet A..Z instead of 1..26, and get the off-by-one things the way you need.

Re: Algorithm to convert combinations to bitstring (size)
by tye (Sage) on Oct 18, 2006 at 18:25 UTC

I suspect it may be much faster to just assign one bit to each member of the superset and set that bit iff that member is in the subset. Then you need N bits not log2( C(N,K) ) bits. The size difference is likely to be fairly small (you don't mention your values for N and K, though I'm pretty sure you have specific values in mind) while the speed difference in conversion will likely be considerable.

Update: Alternately, if the size difference is huge, then that probably means that K is close to either 0 or N so you could use log2(N**K) or log2(N**(N-K)) bits and treat it as a base-N number where each "digit" represents a member of the subset.

Update2: In the ChatterBox, Limbic~Region explained that he wants to use the bit string to track a subset of the set of all possible combinations (one bit for each possible combination). So remove the log2()s from my equations and replace N with 2**N, which makes the space difference much more significant.

- tye

tye,
N = 26 always and I will have 25 different bitstrings for K = 1 .. 25. I am not sure I understand what you mean by your 1 bit per superset and then set the bit if that member is in the subset.
```ABCDE = 00000
AC    = 10100
DE    = 10100 + 00011 = 10111 ?

Cheers - L~R

26 bits isn't very many so A=2**0=1, B=2**1=2, ..., Z=2**25=0x2000000. ABZ=0x2000003.

- tye

Re: Algorithm to convert combinations to bitstring
by doowah2004 (Monk) on Oct 18, 2006 at 17:50 UTC
If you just want to know the place of the combination, then you could do the following:

```let n be the 0 based length of your set (for ABCDE n = 4)
let x be the 0 based position of the first element (for 'CD' x = 2)
let y be the 0 based position of the second element (for 'CD' y  = 3)
The position in the outputed combination would be:

[n(n+1)-x(x+1)]/2 + y-x

so for 'CD': [4(4+1) - 2(2+1)]/2 + 3 - 2
==>   [20 - 6]/2 + 1 = 8 (solution is not 0 based)

not sure if that is what you wanted, but fun to think about.

*Updated* added the code tags per L~R's request.
doowah2004,
Please use <CODE> tags to prevent things in [] from being interpreted as a link. FWIW, I already had a solution in mind before posting this but didn't include it as to not influence anyone. Your solution is similar to mind except you have not generalized it - what if instead of 2 items at a time I am taking 4?

Cheers - L~R

I will use an arbitrary example to show my line of think though it is not completely pieced together. Anyway this is what I have come up with:

for the alphabet (A-Z) n = 26
let k = 3
we want the location of 'HRY'
to find the "start location for 'H', that is the first instance where 'H' is the first character, we use:

```sum[(n-x)!/((k-y)!*(n-x-k)!)]as x=0..z
where:
y = the base 1 position of 'H' in 'HRY' (in this case 1)
z = the base 1 position of 'H' in the alphabet

There are 2 special cases, when k = 2 and when k = n. I will not address k = n because it is uninteresting, and for k = 2 see my previous comment in this thread.

If k > 3, then you would sum across the above equation for each character in the combination making the a appropriate substitution for y, and making a substitution for n such that n' = n - the alphabetic position of the character before it.

Hope that this is not too messy. Also this was done with pen and paper, so it is untested.
Re: Algorithm to convert combinations to bitstring
by ambrus (Abbot) on Oct 30, 2006 at 17:19 UTC

Here's some old code on this topic. Feel free to experiment with it.

```use warnings;
use strict;

sub perm_decode {
my(@perm) = @_;
my(\$n, \$k, \$b, \$v, \$e);
(\$n, \$k, \$b, \$v) = (0, 0, 0, 0);
for \$e (@perm) {
#\$b == choose(\$n, \$k - 1) or die "assertion failed";
\$e or \$v += \$b;
\$n++;
\$e and \$k++;
\$b = \$k <= 1 ?
(\$k < 1 ? 0 : 1) :
(\$b * \$n / (\$e ? \$k - 1 : \$n - \$k + 1));
}
\$b = \$k <= 0 ? 1 : (\$b * (\$n - \$k + 1)) / \$k;
#\$b == choose(\$n, \$k) or die "assertion failed";
\$n, \$k, \$v, \$b;
}

sub perm_encode {
my(\$n, \$k, \$v) = @_;
my(@r, \$bi);
while (0 < \$n) {
\$n--;
\$bi = choose(\$n, \$k - 1);
if (\$v < \$bi) {
unshift @r, 1;
\$k--;
} else {
\$v -= \$bi;
unshift @r, 0;
}
}
@r;
}

Update. Here's how to use the above subroutine.

The perm_decode function accepts a permutation of K zeros and N-K ones as input, but the K letter word can easily be converted to such a permutation with this function. As you can see, the perm_decode function returns N, K, and C apart from the rank.

```sub word_to_indicator {
my(\$n, \$w) = @_;
my(\$p, \$d) = 0;
reverse map {
\$p += \$d = chr(ord("A") + \$_) eq substr(\$w, \$p, 1) ? 1
+ : 0;
\$d;
} 0 .. \$n - 1;
}

if (1) {
# here, N = 5, K = 2.
my @combination = qw{AB AC AD AE BC BD BE CD CE DE};
for my \$combination (@combination) {
my @indicator = word_to_indicator(5, \$combination);
my(\$N, \$K, \$rank, \$C) = perm_decode(@indicator);
print "\$combination : @indicator : \$N \$K \$C \$rank\n";
}
}
Re: Algorithm to convert combinations to bitstring
by spx2 (Deacon) on Dec 31, 2009 at 20:13 UTC
Using Bit::Vector you can generate subsets of some set. Using -> increment to increment its value and ->contains to check a particular bit and ->Bit_On and ->Bit_Off to set bits. I think it suits your problem. So in your case if you go over all bitstrings of 5 bits that have just 2 lit, you get what you have in your example.
spx2,
Would you mind coding up a complete example? I ask because I haven't used Bit::Vector that much and don't immediately see how it could be used to accomplish this task. I solved this problem a long time ago but am always interested in seeing how abstraction modules can make my life easier.

Keep in mind that the goal would be to have a known set ( say 'A' .. 'Z') and a known subset size (say 5). Then, given a valid subset of the specified size (say AMTXZ), determine if that subset has been "seen" before. If not, perform some action and then mark it as being "seen". Here is some starter code to give you an idea of what I have already accomplished:

```my @set = 'A' .. 'Z';
my \$subset_size = 5;
my @subset = qw/A M T X Z/;

process_subset(@subset);
process_subset(@subset);

sub process_subset {
if (seen(@_)) {
return;
}
print "Performing expensive operation on '@_'\n";
set_seen(@_);
}

sub seen {
# For you to implement
# ...
}

sub set_seen {
# For you to implement
# ...
}

Keep in mind that @set, @subset_size and @subset are all subject to change - so it has to be a general purpose solution.

Cheers - L~R

Hi L~R,
```  1 use List::AllUtils qw/sum/;
2 use Bit::Vector;
3
4 my \$k = 2;
5
6 my @letters = split '',"ABCDE";
7 my \$n = scalar(@letters);
8
9 my \$vec = Bit::Vector->new(\$n);
10
11 while(!\$vec->increment){
12     my \$i = \$vec->to_Bin; # current bitstring
13     my \$s = sum(split '',\$i); # sum up to find out how many bits a
+re lit
14
15     next unless (\$s == \$k); # just the ones with \$k bits lit
16
17
18     my \$str = # convert bitstring to string
19         join '',
20         map { \$letters[\$_]  }
21         grep { \$vec->contains(\$_);  }
22         0..\$n-1 ;
23
24     print "\$s \$i \$str\n";
25 };

The seen and set_seen routines are incorporated in the check at line 15.

The output for this is:

```2 00011 AB
2 00101 AC
2 00110 BC
2 01010 BD
2 01100 CD
2 10001 AE
2 10010 BE
2 10100 CE
2 11000 DE

Re: Algorithm to convert combinations to bitstring
by ikegami (Pope) on Oct 18, 2006 at 18:30 UTC

```my %lookup = map { \$_ => 1 << (ord(\$_) - ord('A')) } 'A'..'Z';

my \$c = 'ABCHI';

my \$bitmap = 0;
\$bitmap |= \$lookup{\$_} for \$c =~ /(.)/g;

print('A'..'Z', "\n");
print(scalar(reverse(sprintf("%26b", \$bitmap))), "\n");

outputs

```ABCDEFGHIJKLMNOPQRSTUVWXYZ
111000011

Update: Array version:

```my @powers = map { 1 << \$_ } 0..25;

my \$c = 'ABCHI';

my \$bitmap = 0;
\$bitmap |= \$powers[ord(\$_)-ord('A')] for \$c =~ /(.)/g;

print('A'..'Z', "\n");
print(scalar(reverse(sprintf("%26b", \$bitmap))), "\n");

Update: Replaced += with |=. It makes it tolerant to repeated letters.

ikegami,
You will have to forgive me, but I have no idea how this solves the problem I asked. It could very well be that I have not done a good job of explaining what I want but in your example:
```N = 26
K = 25
C = 65,780
I do not see how your output tells me that the combination representing 'ABCHI' which should be a single bit in a bitstring of 65,780 bits is present?

Cheers - L~R

Create A New User
Node Status?
node history
Node Type: perlquestion [id://579130]
Approved by grep
Front-paged by andyford
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chanting in the Monastery: (4)
As of 2019-09-15 06:02 GMT
Sections?
Information?
Find Nodes?
Leftovers?
Voting Booth?
The room is dark, and your next move is ...

Results (178 votes). Check out past polls.

Notices?