This Code is for intersection for union code see
Re^3: better union of sets algorithm?
It seems that you can get a lot of improvement.
Here are two tests I ran. The first is using two sets each with about half of /usr/dict/words and the third having the word zoo in it.
Union wineglasses zoo
Union wineglasses zoo
(warning: too few iterations for a reliable count)
s/iter sorted unsorted
sorted 4.34 -- -78%
unsorted 0.956 354% --
Intersection wineglasses zoo
Intersection wineglasses zoo
The data for this run is
@a1 = qw (a b c d g);
@a2 = qw (a b c e d);
@a3 = qw (a b c f g);
Intersection a b c
Intersection a b c
Rate unsorted sorted
unsorted 205/s -- -93%
sorted 2775/s 1251% --
Intersection a b c
Intersection a b c
So even with a small data set the sorted method is faster.
For both methods the sets must have unique elements (as all
set do).
sub intersection_1
{
my $count = shift;
$count--;
my %saw;
grep($count == $saw{$_}++, @_);
}
# This code is based on the SortedMultiList code but works
# for an intersection of more than two sets.
sub intersection_2
{
my $self = [@_];
my ($lowest_arr, $lowest, $next);
my @ret = ();
my $elements = scalar @_;
while (1) {
my $count = 0;
$lowest = $next;
if (!defined($lowest)) {
# find min element in queue
foreach my $a (@$self)
{
return @ret unless (@$a);
if (!defined($lowest) or ($a->[0] lt $lowest))
{
$lowest = $a->[0];
}
}
}
foreach my $a (@$self)
{
return @ret unless (@$a);
if ($a->[0] eq $lowest)
{
shift @$a;
$count++;
} else {
if (!defined($next) or ($a->[0] lt $next))
{
$next = $a->[0];
}
}
}
if ($count == $elements) {
push(@ret, $lowest);
}
$lowest = $next;
undef $next;
}
}
Here is the Benchmark code. How the argments are passed may have some effect on the speed. intersection_2 modifies the arrays so I do the copies to make the test more fair.
cmpthese -10, {
unsorted => q[
my @x1 = @a1;
my @x2 = @a2;
my @x3 = @a3;
my @out = intersection_1(3, @x1, @x2, @x3);
],
sorted => q[
my @x1 = @a1;
my @x2 = @a2;
my @x3 = @a3;
my @out = intersection_2(\@x1, \@x2, \@x3);
]
};
The SortedMultiList code is at
Re: better union of sets algorithm?.
A picture is worth a thousand words, but takes 200K.
Re^2: better union of sets algorithm?
by perrin (Chancellor) on Mar 11, 2005 at 20:46 UTC
|
Thanks, but it looks like your code is computing the intersection of the sets, not the union. | [reply] [Watch: Dir/Any] |
|
Yes, you are correct. How embaracing.
The timing remains about the same though.
Union_1 96235
Union_2 96235
(warning: too few iterations for a reliable count)
s/iter sorted unsorted
sorted 4.39 -- -66%
unsorted 1.48 196% --
Union_1 96235
Union_2 96235
Union a b c d g e f
Union a b c d e f g
Rate unsorted sorted
unsorted 202/s -- -92%
sorted 2511/s 1144% --
Union a b c d g e f
Union a b c d e f g
sub union_1
{
my $count = shift;
$count--;
my %saw;
grep(0 == $saw{$_}++, @_);
}
sub union_2
{
my $self = [@_];
my ($lowest_arr, $lowest, $next);
my @ret = ();
my $elements = scalar @_;
while (1) {
my $count = 0;
my $work = 0;
my $lowest;
foreach my $a (@$self)
{
next unless (@$a);
if (!defined($lowest) or ($a->[0] le $lowest))
{
$lowest = $a->[0];
}
}
foreach my $a (@$self)
{
next unless (@$a);
if ($a->[0] eq $lowest)
{
shift @$a;
$work++;
}
}
return @ret unless $lowest;
push(@ret, $lowest);
}
}
A picture is worth a thousand words, but takes 200K.
| [reply] [Watch: Dir/Any] [d/l] |