perlquestion
Amendil
<p>Hello Perl Monks,</p>
<p>I'm working on a tsv, one of its columns is a csv list of keywords (28 unique values). I'd like to compute the Jaccard Index (Intersection / Union) of this list of keywords.
To do so efficiently I'd like to use a bit array to represent the list of keywords.</p>
<p>I tried to read few articles on Perlmonks and stackoverflow, but so far I feel I'm missing something completely obvious.</p>
<p>Here is what I wrote:</p>
<code>
use common::sense;
my $a = '';
my $b = '';
$a += 1 << 0;
$a += 1 << 1;
$b += 1 << 1;
$b += 1 << 2;
my $i = $a & $b;
my $u = $a | $b;
my $i_cnt = unpack '%32b*', $i;
my $u_cnt = unpack '%32b*', $u;
printf "a is %#032b %d\n", $a, $a;
printf "b is %#032b %d\n", $b, $b;
printf "intersection is %#032b %d\n", $i, $i;
printf "union is %#032b %d\n", $u, $u;
say "set bit count in intersection: $i_cnt";
say "set bit count in union: $u_cnt";
</code>
<p>Actual result:</p>
<code>
a is 0b000000000000000000000000000011 3
b is 0b000000000000000000000000000110 6
intersection is 0b000000000000000000000000000010 2
union is 0b000000000000000000000000000111 7
set bit count in intersection: 3
set bit count in union: 5
</code>
<p>Expected result:</p>
<code>
a is 0b000000000000000000000000000011 3
b is 0b000000000000000000000000000110 6
intersection is 0b000000000000000000000000000010 2
union is 0b000000000000000000000000000111 7
set bit count in intersection: 1
set bit count in union: 3
</code>