Re^2: Fast, Efficient Union and Intersection on arrays

in reply to Re: Fast, Efficient Union and Intersection on arrays
in thread Fast, Efficient Union and Intersection on arrays

It seems counterproductive to grep the b array twice. But it's actually faster than using one for loop.

I did a benchmark, and there is a 20% speed boost for doing two greps over just one for loop with short arrays (less than 10). The boost shrinks as the arrays grow in size. At array size ~50 boost is only about 8%.

use strict;
use warnings;

use Benchmark qw(cmpthese clearallcache);

my @a = (1..50);
my @b = (3..80);

#two_greps();
#one_for();
#exit;

print "Range ($a[0],$a[$#a]) vs ($b[0],$b[$#b])\n";

cmpthese( 100_000, {
    two_greps => \&two_greps,
    one_for   => \&one_for,
});

clearallcache();

@a = (1..5);
@b = (3..8);

print "Range ($a[0],$a[$#a]) vs ($b[0],$b[$#b])\n";
cmpthese( 100_000, {
    two_greps => \&two_greps,
    one_for   => \&one_for,
});



sub two_greps {

        my %hash;
        @hash{@a} = (1) x @a;

        my @union        = @a, grep { !$hash{$_} } @b;
        my @intersection =     grep {  $hash{$_} } @b;

#        print "union @union\n";
#        print "interstection @intersection\n";
}

sub one_for {

        my %hash;
        @hash{@a} = (1) x @a;

        my @union = @a;
        my @intersection;
        foreach (@b) {
            if( exists $hash{$_} ) { 
                push @intersection, $_ 
            }
            else {
                push @union, $_ 
            }
        }

#        print "union @union\n";
#        print "interstection @intersection\n";
}

__DATA__
Range (1,50) vs (3,80)
            Rate   one_for two_greps
one_for   5829/s        --       -8%
two_greps 6318/s        8%        --
Range (1,5) vs (3,8)
             Rate   one_for two_greps
one_for   57143/s        --      -17%
two_greps 68823/s       20%        --
[download]

Thanks for posting this thought provoking code.

TGI says moo

In Section Seekers of Perl Wisdom