http://www.perlmonks.org?node_id=989781


in reply to Identical Arrays

This eliminates the sort, instead favoring the efficient operation of constructing a set of unique keys, but assumes elements are unique per array. Essentially we're throwing a little memory at the problem to avoid sorting. As a bonus, we don't even have to care what the elements look like; the hashing handles that for us behind the scenes.

my @a = ( 1, 2, 3 ); my @b = ( 2, 3, 1 ); my @c = ( 1, 2, 4 ); print "(@a) and (@b) are equal sets.\n" if identical( \@a, \@b ); print "(@a) and (@c) are equal sets.\n" if identical( \@a, \@c ); print "(@b) and (@c) are equal sets.\n" if identical( \@b, \@c ); sub identical { my( $left, $right ) = @_; return 0 if scalar @$left != scalar @$right; my %hash; @hash{ @$left, @$right } = (); return scalar keys %hash == scalar @$left; }

This won't do a deep test. It just tests set equality at face value. If your arrays contain complex data structures you'll need to clarify that point. And again, it assumes each array's elements are unique within that array.

This first tests if the two arrays have an equal number of elements (if they don't, there's no point continuing). Then a hash is constructed where each key represents a set entry. Both arrays contribute to the keys. If the number of keys is greater or less than the number of elements in either one of the arrays, we know that we don't have full equality (or in terms familiar to set theory, we do have a symmetric difference).

Update: If you cannot assume elements are unique per array you're dealing with a "bag" (or multi-set), and this technique won't be reliable. Dealing with a multi-set you would use two hashes, and store a quantity per key. Then you would have to verify both hashes have all the same keys, and in the same quantity. Still not as inefficient as sorting, but does require two passes through the full range of elements. Here's one way to do that:

use List::MoreUtils qw/any/; my @a = ( 1, 2, 3 ); my @b = ( 2, 3, 1 ); my @c = ( 1, 2, 4 ); print "(@a) and (@b) are equal sets.\n" if identical( \@a, \@b ); print "(@a) and (@c) are equal sets.\n" if identical( \@a, \@c ); print "(@b) and (@c) are equal sets.\n" if identical( \@b, \@c ); sub identical { my( $left, $right ) = @_; return 0 if scalar @$left != scalar @$right; my( %left, %right ); $left{$_}++ for @$left; $right{$_}++ for @$right; return 0 if any{ ! exists $left{$_} || ! exists $right{$_} || $left{$_} != $right{$_} } keys %left, keys %right; return 1; }

And another update: The preceding algorithm could be further optimized. There's no need to keep track of two hashes; we only want to make sure that the net tally between the two hashes is 0 per key. We could do that in a single hash like this:

use List::MoreUtils qw/any/; my @a = ( 1, 2, 3 ); my @b = ( 2, 3, 1 ); my @c = ( 1, 2, 4 ); print "(@a) and (@b) are equal sets.\n" if identical( \@a, \@b ); print "(@a) and (@c) are equal sets.\n" if identical( \@a, \@c ); print "(@b) and (@c) are equal sets.\n" if identical( \@b, \@c ); sub identical { my( $left, $right ) = @_; return 0 if scalar @$left != scalar @$right; my( %bag ); $bag{$_}++ for @$left; $bag{$_}-- for @$right; return 0 if any{ $bag{$_} } keys %bag; return 1; }

And now that we've implemented that, we may as well admit we're trying to solve something that has been solved before (many times): We're looking for a condition where there is no symmetric difference between two bags (multisets). Set::Bag works fairly well:

use Set::Bag; my @a = ( 1, 2, 3 ); my @b = ( 2, 3, 1 ); my @c = ( 1, 2, 4 ); print "(@a) and (@b) are equal sets.\n" if identical( \@a, \@b ); print "(@a) and (@c) are equal sets.\n" if identical( \@a, \@c ); print "(@b) and (@c) are equal sets.\n" if identical( \@b, \@c ); sub identical { my ( $left, $right ) = @_; return 0 if scalar @$left != scalar @$right; my $bag_a = Set::Bag->new; my $bag_b = Set::Bag->new; $bag_a->over_delete(1); $bag_b->over_delete(1); $bag_a->insert( $_ => 1 ) for @$left; $bag_b->insert( $_ => 1 ) for @$right; return 1 if $bag_a->difference($bag_b) eq '()' && $bag_b->difference($bag_a) eq '()'; return 0; }

My biggest complaint with Set::Bag is that it stringifies the results, so instead of being happy that $bag_a->difference($bag_b) returns nothing, we instead have to check that $bag_a->difference($bag_b) returns a string that looks like an empty set: '()'. ...and the fact that it's essentially an implementation of the two-hash approach, which can be optimized down to a single-hash.


Dave