Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

Re: better intersection of sets algorithm?

by gam3 (Curate)
on Mar 11, 2005 at 20:23 UTC ( [id://438794]=note: print w/replies, xml ) Need Help??


in reply to better union of sets algorithm?

This Code is for intersection for union code see Re^3: better union of sets algorithm?

It seems that you can get a lot of improvement. Here are two tests I ran. The first is using two sets each with about half of /usr/dict/words and the third having the word zoo in it.

Union wineglasses zoo
Union wineglasses zoo
            (warning: too few iterations for a reliable count)
         s/iter   sorted unsorted
sorted     4.34       --     -78%
unsorted  0.956     354%       --
Intersection wineglasses zoo
Intersection wineglasses zoo
The data for this run is
@a1 = qw (a b c d g);
@a2 = qw (a b c e d);
@a3 = qw (a b c f g);

Intersection a b c
Intersection a b c
           Rate unsorted   sorted
unsorted  205/s       --     -93%
sorted   2775/s    1251%       --
Intersection a b c
Intersection a b c
So even with a small data set the sorted method is faster. For both methods the sets must have unique elements (as all set do).
sub intersection_1 { my $count = shift; $count--; my %saw; grep($count == $saw{$_}++, @_); } # This code is based on the SortedMultiList code but works # for an intersection of more than two sets. sub intersection_2 { my $self = [@_]; my ($lowest_arr, $lowest, $next); my @ret = (); my $elements = scalar @_; while (1) { my $count = 0; $lowest = $next; if (!defined($lowest)) { # find min element in queue foreach my $a (@$self) { return @ret unless (@$a); if (!defined($lowest) or ($a->[0] lt $lowest)) { $lowest = $a->[0]; } } } foreach my $a (@$self) { return @ret unless (@$a); if ($a->[0] eq $lowest) { shift @$a; $count++; } else { if (!defined($next) or ($a->[0] lt $next)) { $next = $a->[0]; } } } if ($count == $elements) { push(@ret, $lowest); } $lowest = $next; undef $next; } }
Here is the Benchmark code. How the argments are passed may have some effect on the speed. intersection_2 modifies the arrays so I do the copies to make the test more fair.
cmpthese -10, {
    unsorted => q[
        my @x1 = @a1;
        my @x2 = @a2;
        my @x3 = @a3;
        my @out = intersection_1(3, @x1, @x2, @x3);
    ],
    sorted => q[
        my @x1 = @a1;
        my @x2 = @a2;
        my @x3 = @a3;
        my @out = intersection_2(\@x1, \@x2, \@x3);
    ]
};
The SortedMultiList code is at Re: better union of sets algorithm?.
A picture is worth a thousand words, but takes 200K.

Replies are listed 'Best First'.
Re^2: better union of sets algorithm?
by perrin (Chancellor) on Mar 11, 2005 at 20:46 UTC
    Thanks, but it looks like your code is computing the intersection of the sets, not the union.
      Yes, you are correct. How embaracing. The timing remains about the same though.
      Union_1 96235
      Union_2 96235
                  (warning: too few iterations for a reliable count)
               s/iter   sorted unsorted
      sorted     4.39       --     -66%
      unsorted   1.48     196%       --
      Union_1 96235
      Union_2 96235
      Union a b c d g e f
      Union a b c d e f g
                 Rate unsorted   sorted
      unsorted  202/s       --     -92%
      sorted   2511/s    1144%       --
      Union a b c d g e f
      Union a b c d e f g
      
      sub union_1 { my $count = shift; $count--; my %saw; grep(0 == $saw{$_}++, @_); } sub union_2 { my $self = [@_]; my ($lowest_arr, $lowest, $next); my @ret = (); my $elements = scalar @_; while (1) { my $count = 0; my $work = 0; my $lowest; foreach my $a (@$self) { next unless (@$a); if (!defined($lowest) or ($a->[0] le $lowest)) { $lowest = $a->[0]; } } foreach my $a (@$self) { next unless (@$a); if ($a->[0] eq $lowest) { shift @$a; $work++; } } return @ret unless $lowest; push(@ret, $lowest); } }
      A picture is worth a thousand words, but takes 200K.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://438794]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others pondering the Monastery: (2)
As of 2024-03-19 04:28 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found