Pathologically Eclectic Rubbish Lister PerlMonks

### Comment on

 Need Help??

I've had to add and edit a bit of code outside the actual compared routines. The OP mentioned that the ratio of 1 to 0 entries is 1::500, which is a fact I used to come up with my approach. So the first change is the ability to set the probability of 1 bits.

Since the 1 bits are sparse, rather than making an explicit vector, I use a list of the positions of the 1 bits. In order to come up with the same results, I converted the Y vectors list format.

The biggest change outside of the comparison routine is the setup routine that transforms the xArray. The concept was to do something like this: Build an artificial set of vectors each with 1 bit--one vector per bit position. Then we compare each of these artificial vectors against the xArray set, resulting in a list of x vectors for each bit position. Then in our comparison, we aggregate the selected bins. Thus, if y has five bits in it, we add in the five partial products from the eigenset vectors. So the process of building the lists is amortized over the run of comparisons.

Having said all that, here it is. As mentioned previously, I came up with my approach when I saw that the distribution of 1s was very sparse. As the density of 1s increases, the routine gets progressively slower.

```\$ perl 1067357_mcm.pl -I=1 -N=100 -W=4000 -P=.05
<<< snipped >>>
Rate   array strings    bits    robo
array   2.33e-02/s      --    -98%    -99%   -100%
strings     1.45/s   6122%      --    -25%    -72%
bits        1.92/s   8156%     33%      --    -63%
robo        5.26/s  22495%    263%    174%      --

\$ perl 1067357_mcm.pl -I=1 -N=100 -W=4000 -P=.5
<<< snipped >>>
s/iter   array    robo strings    bits
array     60.4      --    -87%    -99%    -99%
robo      7.75    680%      --    -91%    -93%
strings  0.690   8659%   1023%      --    -23%
bits     0.530  11304%   1362%     30%      --
```#! perl -slw
use strict;
use Benchmark qw[ cmpthese ];
use Data::Dump qw[ pp ]; \$Data::Dump::WIDTH = 500;

our \$I //= -1;
our \$N //= 1000;
our \$W //= 15_000;

# OP mentions that 1 bits are sparse (1 :: 500)  It doesn't make a dif
+ference for straight
# bit comparison techniques, but it *does* for mine.
our \$P //= 0.002;
our @xArrays = map[ map { rand()<\$P ? 1 : 0 }  1 .. \$W ], 1 .. \$N;
our @yArrays = map[ map { rand()<\$P ? 1 : 0 }  1 .. \$W ], 1 .. \$N;

our @xStrings = map{ join '', @\$_  } @xArrays;
our @yStrings = map{ join '', @\$_  } @yArrays;

our @xBits = map{ pack 'b*', \$_ } @xStrings;
our @yBits = map{ pack 'b*', \$_ } @yStrings;

# Because the bit vectors are sparse, I'm storing them as lists of bit
+ positions, rather
# than bit vectors, so I'm converting Y
my \$start = time;
our @yArrays2;
for my \$y (0 .. \$#yArrays) {
for my \$bitcol (0 .. \$W-1) {
push @{\$yArrays2[\$y]}, \$bitcol if \$yArrays[\$y][\$bitcol];
}
}
my \$cur = time - \$start;
print "Converting yArrays format took \$cur sec.\n";

# I bin the X arrays after which I never use them again, so I'll bin t
+hem directly from the
# originally generated vectors.
\$start = time;
our @bitcols;
for my \$x (0 .. \$#xArrays) {
for my \$bitcol (0 .. 14_999) {
push @{\$bitcols[\$bitcol]}, \$x if \$xArrays[\$x][\$bitcol];
}
}
\$cur = time - \$start;
print "Binning xArrays took \$cur sec.\n";

cmpthese \$I, {
array => q[
my %top10s;
for my \$x ( 0 .. \$#xArrays ) {
for my \$y ( 0 .. \$#yArrays ) {
my \$count = 0;
\$xArrays[\$x][\$_] == 1 && \$yArrays[\$y][\$_] == 1 and ++\$
+count for 0 .. \$#{ \$xArrays[ 0 ] };
\$top10s{"\$x:\$y"} = \$count;
my \$discard = ( sort{ \$top10s{\$a} <=> \$top10s{\$b}  ||
+\$a cmp \$b} keys %top10s )[ 0 ];
keys( %top10s ) > 10 and delete \$top10s{\$discard};
}
}
\$I == 1 and pp ' arrays: ', %top10s;
],
strings => q[
my %top10s;
for my \$x ( 0 .. \$#xStrings ) {
for my \$y ( 0 .. \$#yStrings ) {
my \$count = ( \$xStrings[\$x] & \$yStrings[\$y] ) =~ tr[1]
+[];
\$top10s{"\$x:\$y"} = \$count;
my \$discard = ( sort{ \$top10s{\$a} <=> \$top10s{\$b}  ||
+\$a cmp \$b} keys %top10s  )[ 0 ];
keys( %top10s ) > 10 and delete \$top10s{\$discard};
}
}
\$I == 1 and pp 'strings: ', %top10s;
],
bits => q[
my %top10s;
for my \$x ( 0 .. \$#xBits ) {
for my \$y ( 0 .. \$#yBits ) {
my \$count = unpack '%32b*', ( \$xBits[\$x] & \$yBits[\$y]
+);
\$top10s{"\$x:\$y"} = \$count;
my \$discard = ( sort{ \$top10s{\$a} <=> \$top10s{\$b}  ||
+\$a cmp \$b} keys %top10s )[ 0 ];
keys( %top10s ) > 10 and delete \$top10s{\$discard};
}
}
\$I == 1 and pp '   bits: ', %top10s;
],
robo => sub {
my %top10s;
my \$cur_min=0;
for my \$y ( 0 .. \$#yArrays2 ) {
my %x;
for my \$bitcol (@{\$yArrays2[\$y]}) {
\$x{\$_}++ for @{\$bitcols[\$bitcol]};
}
for my \$k (keys %x) {
next unless \$x{\$k} >= \$cur_min;
\$top10s{"\$k:\$y"} = \$x{\$k};
my \$discard = ( sort{ \$top10s{\$a} <=> \$top10s{\$b} || \$
+a cmp \$b } keys %top10s )[ 0 ];
if (keys %top10s > 10) {
}
}
}
\$I == 1 and pp '   bits: ', %top10s;
},
};

...roboticus

When your only tool is a hammer, all problems look like your thumb.

In reply to Re^6: Comparing two arrays by roboticus
in thread Comparing two arrays by baxy77bax

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":

• Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
• Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
• Read Where should I post X? if you're not absolutely sure you're posting in the right place.
• Posts may use any of the Perl Monks Approved HTML tags:
a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
• You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
 For: Use: & & < < > > [ [ ] ]
• Link using PerlMonks shortcuts! What shortcuts can I use for linking?

Create A New User
Chatterbox?
 [ambrus]: holli: wait, plastic or glass bottle? holli switches OS [holli]: Lidl only sells plastic [Corion]: holli: Oooh, a twister top - so yes, that's a worse state of mind than it sounded ;) [holli]: and since I outed myself anyway as a horrible person, to top it off, the bottle contains beer. alcohol free beer [Discipulus]: nowaday such beers have also a good taste [Corion]: holli: Heh - if you consider alcohol free beer as a not-so-sweet lemonade, you're not that bad off ;)

How do I use this? | Other CB clients
Other Users?
Others avoiding work at the Monastery: (16)
As of 2017-09-25 12:57 GMT
Sections?
Information?
Find Nodes?
Leftovers?
Voting Booth?
During the recent solar eclipse, I:

Results (280 votes). Check out past polls.

Notices?