note
TGI
<p>It seems counterproductive to grep the b array twice. But it's actually faster than using one for loop.
<p>I did a benchmark, and there is a 20% speed boost for doing two greps over just one for loop with short arrays (less than 10). The boost shrinks as the arrays grow in size. At array size ~50 boost is only about 8%.
<c>
use strict;
use warnings;
use Benchmark qw(cmpthese clearallcache);
my @a = (1..50);
my @b = (3..80);
#two_greps();
#one_for();
#exit;
print "Range ($a[0],$a[$#a]) vs ($b[0],$b[$#b])\n";
cmpthese( 100_000, {
two_greps => \&two_greps,
one_for => \&one_for,
});
clearallcache();
@a = (1..5);
@b = (3..8);
print "Range ($a[0],$a[$#a]) vs ($b[0],$b[$#b])\n";
cmpthese( 100_000, {
two_greps => \&two_greps,
one_for => \&one_for,
});
sub two_greps {
my %hash;
@hash{@a} = (1) x @a;
my @union = @a, grep { !$hash{$_} } @b;
my @intersection = grep { $hash{$_} } @b;
# print "union @union\n";
# print "interstection @intersection\n";
}
sub one_for {
my %hash;
@hash{@a} = (1) x @a;
my @union = @a;
my @intersection;
foreach (@b) {
if( exists $hash{$_} ) {
push @intersection, $_
}
else {
push @union, $_
}
}
# print "union @union\n";
# print "interstection @intersection\n";
}
__DATA__
Range (1,50) vs (3,80)
Rate one_for two_greps
one_for 5829/s -- -8%
two_greps 6318/s 8% --
Range (1,5) vs (3,8)
Rate one_for two_greps
one_for 57143/s -- -17%
two_greps 68823/s 20% --
</c>
<p>Thanks for posting this thought provoking code.
<!-- Node text goes above. Div tags should contain sig only -->
<div class="pmsig"><div class="pmsig-25825">
<P><BR>TGI says <B>moo</B></P>
</div></div>
724918
724920