note
Kenosis
<p>Given the OP's two arrays containing approximately 8k unique integer elements, it's evident from [http://perldoc.perl.org/Benchmark.html|benchmark]ing (Perl v5.14.2 64-bit) that using a hash and grep to find the elements in @arr_1 that are not in @arr_2 may be a good choice:</p>
<c>
use strict;
use warnings;
use Set::Scalar;
use List::Compare;
use Benchmark qw/cmpthese/;
my @arr_1 = 0 .. 8e3;
my @arr_2 = 2e3 .. 1e4;
sub setScalar {
my $s1 = Set::Scalar->new(@arr_1);
my $s2 = Set::Scalar->new(@arr_2);
my $diff = $s1->difference($s2);
}
sub listCompare {
my $lc = List::Compare->new( \@arr_1, \@arr_2 );
my @diff = $lc->get_Lonly;
}
sub OPdiff {
my %diff3;
@diff3{@arr_1} = @arr_1;
delete @diff3{@arr_2};
my @diff = ( keys %diff3 );
}
sub OPdiffModified {
my %diff3;
@diff3{@arr_1} = ();
delete @diff3{@arr_2};
my @diff = ( keys %diff3 );
}
sub OPdiff_undef {
my %diff3;
undef @diff3{@arr_1};
delete @diff3{@arr_2};
my @diff = ( keys %diff3 );
}
sub using_vec {
my $vec = '';
vec( $vec, $_, 1 ) = 1 for @arr_2;
my @diff = grep !vec( $vec, $_, 1 ), @arr_1;
}
sub hash_grep {
my %arr_2_hash;
undef @arr_2_hash{@arr_2};
my @diff = grep !exists $arr_2_hash{$_}, @arr_1;
}
cmpthese(
-5,
{
setScalar => sub { setScalar() },
listCompare => sub { listCompare() },
OPdiff => sub { OPdiff() },
OPdiffModified => sub { OPdiffModified() },
OPdiff_undef => sub { OPdiff_undef() },
using_vec => sub { using_vec() },
hash_grep => sub { hash_grep() }
}
);
</c>
<p>Output:</p>
<c>
Rate setScalar listCompare using_vec OPdiff OPdiffModified hash_grep OPdiff_undef
setScalar 7.94/s -- -72% -98% -98% -98% -98% -99%
listCompare 28.1/s 254% -- -92% -93% -94% -94% -96%
using_vec 349/s 4289% 1139% -- -7% -27% -32% -47%
OPdiff 375/s 4623% 1233% 8% -- -22% -26% -42%
OPdiffModified 478/s 5919% 1599% 37% 27% -- -6% -27%
hash_grep 510/s 6317% 1712% 46% 36% 7% -- -22%
OPdiff_undef 652/s 8105% 2217% 87% 74% 36% 28% -- --
</c>
<p><strong>Edit I:</strong> My thanks to [http://perlmonks.org/index.pl?node_id=708738|LanX] for redirecting my attention back to the suggested <c>@diff3{@arr_1} = ()</c>, as it makes a significant difference in performance, as shown in the <c>OPdiffModified()</c> benchmark results. Thus, based upon benchmarking for this task, <c>OPdiffModified()</c> is the best of this group of diff solutions for the OP.</p>
<p><strong>Edit II:</strong> Thanks again to [http://perlmonks.org/index.pl?node_id=708738|LanX] (OK, I'll set up a new node for thanking LanX :), substituted <c>@diff3{@arr_1} = ()</c> with <c>undef @diff3{@arr_1}</c> in <c>OPdiff_undef()</c>, and it's now the fastest.</p>
1064178
1064178