in reply to
Re: Fast, Efficient Union and Intersection on arrays

in thread Fast, Efficient Union and Intersection on arrays

It seems counterproductive to grep the b array twice. But it's actually faster than using one for loop.

I did a benchmark, and there is a 20% speed boost for doing two greps over just one for loop with short arrays (less than 10). The boost shrinks as the arrays grow in size. At array size ~50 boost is only about 8%.

`use strict;
use warnings;
use Benchmark qw(cmpthese clearallcache);
my @a = (1..50);
my @b = (3..80);
#two_greps();
#one_for();
#exit;
print "Range ($a[0],$a[$#a]) vs ($b[0],$b[$#b])\n";
cmpthese( 100_000, {
two_greps => \&two_greps,
one_for => \&one_for,
});
clearallcache();
@a = (1..5);
@b = (3..8);
print "Range ($a[0],$a[$#a]) vs ($b[0],$b[$#b])\n";
cmpthese( 100_000, {
two_greps => \&two_greps,
one_for => \&one_for,
});
sub two_greps {
my %hash;
@hash{@a} = (1) x @a;
my @union = @a, grep { !$hash{$_} } @b;
my @intersection = grep { $hash{$_} } @b;
# print "union @union\n";
# print "interstection @intersection\n";
}
sub one_for {
my %hash;
@hash{@a} = (1) x @a;
my @union = @a;
my @intersection;
foreach (@b) {
if( exists $hash{$_} ) {
push @intersection, $_
}
else {
push @union, $_
}
}
# print "union @union\n";
# print "interstection @intersection\n";
}
__DATA__
Range (1,50) vs (3,80)
Rate one_for two_greps
one_for 5829/s -- -8%
two_greps 6318/s 8% --
Range (1,5) vs (3,8)
Rate one_for two_greps
one_for 57143/s -- -17%
two_greps 68823/s 20% --
`

Thanks for posting this thought provoking code.