### Re^2: Comparing two arrays

by baxy77bax (Chaplain)
 on Dec 15, 2013 at 12:58 UTC ( #1067231=note: print w/ replies, xml ) Need Help??

in reply to Re: Comparing two arrays

thank you so much for the code and the benchmark, after seeing this i'll try to implement the strategy. However what i'm wondering now is where does the speed come from. When I search for a certain bit in a bit-string I remember reading somewhere that the bit is found by iterating through the memory block whereas accessing an array element is constant. is it possible that these constants are so large that it is cheaper to linearly scan through memory blocks or did i mixed up something (Which is probably the case). Could you please educate me a "bit" :)

Thank you

baxy

Re^3: Comparing two arrays
by BrowserUk (Pope) on Dec 15, 2013 at 13:47 UTC
i'm wondering now is where does the speed come from.

Perhaps the simplest way to demonstrate the difference is to look at the number of opcodes generated in order to compare and count two sets of 64 bits stored as: two arrays; two strings of ascii 1s and 0s; two bitstrings of 64 bits each. You don't need to understand the opcodes to see the reduction.

Moving as much of the work (looping) into the optimised, compiled-C, opcodes just saves huge swaths of time and processor:

1. Arrays:
```C:\test>perl -MO=Terse -E"@a=map{int rand 2}1..64;@b=map{int rand 2}1.
+.64; for my\$a(@a){ for my \$b(@b){ \$a==\$b and ++\$count }}"
LISTOP (0x34e7c58) leave [1]
OP (0x34eec40) enter
COP (0x34e7c98) nextstate
BINOP (0x34e7d00) aassign [9]
UNOP (0x34e7d70) null [142]
OP (0x34e7d40) pushmark
LOGOP (0x34e7e90) mapwhile [8]
LISTOP (0x34e7f00) mapstart
OP (0x34e7ed0) pushmark
UNOP (0x34e7e58) null
UNOP (0x34e7f40) null
LISTOP (0x34e80d0) scope
OP (0x34e8110) null [177]
UNOP (0x34e8178) int [4]
UNOP (0x34e81b0) rand [3]
SVOP (0x34e81e8) const [7] IV
+(0x33cca88) 2
UNOP (0x34e7f78) rv2av
SVOP (0x34e7e20) const [26] AV (0x33c7570)
UNOP (0x34e7de0) null [142]
OP (0x34e7db0) pushmark
UNOP (0x34e8220) rv2av [2]
PADOP (0x34e8258) gv  GV (0xa76c8) *a
COP (0x34e7660) nextstate
BINOP (0x34e76c8) aassign [18]
UNOP (0x34e7738) null [142]
OP (0x34e7708) pushmark
LOGOP (0x34e7858) mapwhile [17]
LISTOP (0x34e78c8) mapstart
OP (0x34e7898) pushmark
UNOP (0x34e7820) null
UNOP (0x34e7908) null
LISTOP (0x34e7a98) scope
UNOP (0x34e7b40) int [13]
UNOP (0x34e7b78) rand [12]
SVOP (0x34e7bb0) const [16] IV
+ (0x33c6e30) 2
UNOP (0x34e7940) rv2av
SVOP (0x34e77e8) const [27] AV (0x33c6830)
UNOP (0x34e77a8) null [142]
OP (0x34e7778) pushmark
UNOP (0x34e7be8) rv2av [11]
PADOP (0x34e7c20) gv  GV (0x33c6f40) *b
COP (0x34eecb0) nextstate
BINOP (0x34eed18) leaveloop
LOOP (0x34eee30) enteriter [19]
OP (0x34eee88) null [3]
UNOP (0x34eef28) null [142]
OP (0x34eeef8) pushmark
UNOP (0x34ef568) rv2av [21]
PADOP (0x34e75b8) gv  GV (0xa76c8) *a
UNOP (0x34eed58) null
LOGOP (0x34eed90) and
OP (0x34eee00) iter
LISTOP (0x34eef68) lineseq
COP (0x34eefa8) nextstate
BINOP (0x34ef010) leaveloop
LOOP (0x34ef128) enteriter [22]
OP (0x34ef180) null [3]
UNOP (0x34ef220) null [142]
OP (0x34ef1f0) pushmark
UNOP (0x34ef4c8) rv2av [24]
+0) *b
UNOP (0x34ef050) null
LOGOP (0x34ef088) and
OP (0x34ef0f8) iter
LISTOP (0x34ef260) lineseq
COP (0x34ef2a0) nextstate
UNOP (0x34ef308) null
LOGOP (0x34ef340) and
BINOP (0x34ef428) eq
+19]
+22]
UNOP (0x34ef380) preinc
UNOP (0x34ef3b8) null
+[15]
+gvsv  GV (0x33c5ed0) *count
OP (0x34ef0c8) unstack
OP (0x34eedd0) unstack
-e syntax OK
2. Strings:
```C:\test>perl -MO=Terse -E"\$a=join'',map{int rand 2}1..64;@b=map{int ra
+nd 2}1..64; \$count=(\$a&\$b)=~tr[1][]"
LISTOP (0x3447bc0) leave [1]
OP (0x344f178) enter
COP (0x3447c00) nextstate
BINOP (0x3447c68) sassign
LISTOP (0x3447cd8) join [8]
OP (0x3447ca8) pushmark
SVOP (0x3448118) const [22] PV (0x332ca20) ""
LOGOP (0x3447d88) mapwhile [7]
LISTOP (0x3447df8) mapstart
OP (0x3447dc8) pushmark
UNOP (0x3447d50) null
UNOP (0x3447e38) null
LISTOP (0x3447fc8) scope
OP (0x3448008) null [177]
UNOP (0x3448070) int [3]
UNOP (0x34480a8) rand [2]
SVOP (0x34480e0) const [6] IV
+(0x332cb58) 2
UNOP (0x3447e70) rv2av
SVOP (0x3447d18) const [23] AV (0x3327640)
UNOP (0x3448150) null [15]
PADOP (0x3448188) gvsv  GV (0xa76a8) *a
COP (0x34475c8) nextstate
BINOP (0x3447630) aassign [17]
UNOP (0x34476a0) null [142]
OP (0x3447670) pushmark
LOGOP (0x34477c0) mapwhile [16]
LISTOP (0x3447830) mapstart
OP (0x3447800) pushmark
UNOP (0x3447788) null
UNOP (0x3447870) null
LISTOP (0x3447a00) scope
OP (0x3447a40) null [177]
UNOP (0x3447aa8) int [12]
UNOP (0x3447ae0) rand [11]
SVOP (0x3447b18) const [15] IV
+ (0x3326f00) 2
UNOP (0x34478a8) rv2av
SVOP (0x3447750) const [24] AV (0x3326900)
UNOP (0x3447710) null [142]
OP (0x34476e0) pushmark
UNOP (0x3447b50) rv2av [10]
PADOP (0x3447b88) gv  GV (0x3327010) *b
COP (0x344f1e8) nextstate
BINOP (0x344f250) sassign
UNOP (0x344f290) null
BINOP (0x344f3e8) bit_and [21]
UNOP (0x344f498) null [15]
PADOP (0x34474e0) gvsv  GV (0xa76a8) *a
UNOP (0x344f428) null [15]
PADOP (0x344f460) gvsv  GV (0x3327010) *b
PVOP (0x344f3b0) trans
UNOP (0x3447518) null [15]
PADOP (0x3447550) gvsv  GV (0x33262d0) *count
-e syntax OK

3. Bits:
```C:\test>perl -MO=Terse -E"\$a=int rand 2**64;\$b=int rand 2**64; \$count
+= unpack '%32b*', \$a & \$b"
LISTOP (0x33e7460) leave [1]
OP (0x33e6e60) enter
COP (0x33e74a0) nextstate
BINOP (0x33e7508) sassign
UNOP (0x33e7548) int [4]
UNOP (0x33e7580) rand [3]
SVOP (0x33e75b8) const [13] NV (0x32ca498) 1.844674407
+37096e+019
UNOP (0x33e76a0) null [15]
PADOP (0x33e76d8) gvsv  GV (0x107668) *a
COP (0x33e71f0) nextstate
BINOP (0x33e7258) sassign
UNOP (0x33e7298) int [8]
UNOP (0x33e72d0) rand [7]
SVOP (0x33e7308) const [14] NV (0x32ca5a0) 1.844674407
+37096e+019
UNOP (0x33e73f0) null [15]
PADOP (0x33e7428) gvsv  GV (0x32ca510) *b
COP (0x33e6ed0) nextstate
BINOP (0x33e6f38) sassign
LISTOP (0x33e6fa8) unpack
OP (0x33e6f78) null [3]
SVOP (0x33e7108) const [15] PV (0x32ca600) "%32b*"
BINOP (0x33e6fe8) bit_and [12]
UNOP (0x33e7098) null [15]
PADOP (0x33e70d0) gvsv  GV (0x107668) *a
UNOP (0x33e7028) null [15]
PADOP (0x33e7060) gvsv  GV (0x32ca510) *b
UNOP (0x33e7140) null [15]
PADOP (0x33e7178) gvsv  GV (0x32ca5d0) *count
-e syntax OK

With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
Re^3: Comparing two arrays
by hdb (Prior) on Dec 15, 2013 at 14:03 UTC

Just be careful to create your data as bitstrings in the first place. If you create arrays and then turn them into bitstrings to do the comparison, then it is not that fast:

```use strict;
use warnings;
use Benchmark 'cmpthese';

sub create { map {rand() < \$_[1] ? 1 : 0} 1..\$_[0] }

sub compare2a { # first find 1s in x, then check in ys
my \$x = shift;
my \$n = shift;
my @nxs = grep { \$x->[\$_] } 0..\$n-1;
return map { scalar grep {\$_} @{\$_}[@nxs] } @_;
}

sub compare4 { # bitstrings
my \$x = shift;
\$x = pack 'b*', join '', @\$x;
return map { unpack '%32b*', ( \$x & pack 'b*', join'',@\$_ ) } @_;
}

my \$n  = 15000;
my \$p  = 0.005;
my \$ny = 10;
my @x = create \$n, \$p;
my @ys = map { [ create \$n, \$p ] } 1..\$ny;

my @r2a = compare2a \@x, \$n, @ys;
my @r4 = compare4 \@x, @ys;
print "compare2a: @r2a\n";
print "compare4:  @r4\n";

cmpthese( -5, {
compare2a => sub{ compare2a \@x, \$n, @ys },
compare4 => sub{ compare4 \@x, @ys },
}
);
Result:
```           Rate  compare4 compare2a
compare4  246/s        --      -55%
compare2a 543/s      120%        --
If you create arrays and then turn them into bitstrings [ everytime ] to do the comparison, then it is not that fast:

No shit Sherlock :)

