Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

Re: Identical Arrays

by davido (Archbishop)
on Aug 26, 2012 at 03:57 UTC ( #989781=note: print w/ replies, xml ) Need Help??


in reply to Identical Arrays

This eliminates the sort, instead favoring the efficient operation of constructing a set of unique keys, but assumes elements are unique per array. Essentially we're throwing a little memory at the problem to avoid sorting. As a bonus, we don't even have to care what the elements look like; the hashing handles that for us behind the scenes.

my @a = ( 1, 2, 3 ); my @b = ( 2, 3, 1 ); my @c = ( 1, 2, 4 ); print "(@a) and (@b) are equal sets.\n" if identical( \@a, \@b ); print "(@a) and (@c) are equal sets.\n" if identical( \@a, \@c ); print "(@b) and (@c) are equal sets.\n" if identical( \@b, \@c ); sub identical { my( $left, $right ) = @_; return 0 if scalar @$left != scalar @$right; my %hash; @hash{ @$left, @$right } = (); return scalar keys %hash == scalar @$left; }

This won't do a deep test. It just tests set equality at face value. If your arrays contain complex data structures you'll need to clarify that point. And again, it assumes each array's elements are unique within that array.

This first tests if the two arrays have an equal number of elements (if they don't, there's no point continuing). Then a hash is constructed where each key represents a set entry. Both arrays contribute to the keys. If the number of keys is greater or less than the number of elements in either one of the arrays, we know that we don't have full equality (or in terms familiar to set theory, we do have a symmetric difference).

Update: If you cannot assume elements are unique per array you're dealing with a "bag" (or multi-set), and this technique won't be reliable. Dealing with a multi-set you would use two hashes, and store a quantity per key. Then you would have to verify both hashes have all the same keys, and in the same quantity. Still not as inefficient as sorting, but does require two passes through the full range of elements. Here's one way to do that:

use List::MoreUtils qw/any/; my @a = ( 1, 2, 3 ); my @b = ( 2, 3, 1 ); my @c = ( 1, 2, 4 ); print "(@a) and (@b) are equal sets.\n" if identical( \@a, \@b ); print "(@a) and (@c) are equal sets.\n" if identical( \@a, \@c ); print "(@b) and (@c) are equal sets.\n" if identical( \@b, \@c ); sub identical { my( $left, $right ) = @_; return 0 if scalar @$left != scalar @$right; my( %left, %right ); $left{$_}++ for @$left; $right{$_}++ for @$right; return 0 if any{ ! exists $left{$_} || ! exists $right{$_} || $left{$_} != $right{$_} } keys %left, keys %right; return 1; }

And another update: The preceding algorithm could be further optimized. There's no need to keep track of two hashes; we only want to make sure that the net tally between the two hashes is 0 per key. We could do that in a single hash like this:

use List::MoreUtils qw/any/; my @a = ( 1, 2, 3 ); my @b = ( 2, 3, 1 ); my @c = ( 1, 2, 4 ); print "(@a) and (@b) are equal sets.\n" if identical( \@a, \@b ); print "(@a) and (@c) are equal sets.\n" if identical( \@a, \@c ); print "(@b) and (@c) are equal sets.\n" if identical( \@b, \@c ); sub identical { my( $left, $right ) = @_; return 0 if scalar @$left != scalar @$right; my( %bag ); $bag{$_}++ for @$left; $bag{$_}-- for @$right; return 0 if any{ $bag{$_} } keys %bag; return 1; }

And now that we've implemented that, we may as well admit we're trying to solve something that has been solved before (many times): We're looking for a condition where there is no symmetric difference between two bags (multisets). Set::Bag works fairly well:

use Set::Bag; my @a = ( 1, 2, 3 ); my @b = ( 2, 3, 1 ); my @c = ( 1, 2, 4 ); print "(@a) and (@b) are equal sets.\n" if identical( \@a, \@b ); print "(@a) and (@c) are equal sets.\n" if identical( \@a, \@c ); print "(@b) and (@c) are equal sets.\n" if identical( \@b, \@c ); sub identical { my ( $left, $right ) = @_; return 0 if scalar @$left != scalar @$right; my $bag_a = Set::Bag->new; my $bag_b = Set::Bag->new; $bag_a->over_delete(1); $bag_b->over_delete(1); $bag_a->insert( $_ => 1 ) for @$left; $bag_b->insert( $_ => 1 ) for @$right; return 1 if $bag_a->difference($bag_b) eq '()' && $bag_b->difference($bag_a) eq '()'; return 0; }

My biggest complaint with Set::Bag is that it stringifies the results, so instead of being happy that $bag_a->difference($bag_b) returns nothing, we instead have to check that $bag_a->difference($bag_b) returns a string that looks like an empty set: '()'. ...and the fact that it's essentially an implementation of the two-hash approach, which can be optimized down to a single-hash.


Dave


Comment on Re: Identical Arrays
Select or Download Code

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://989781]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others pondering the Monastery: (7)
As of 2014-12-25 13:47 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (160 votes), past polls