Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

Comparing Arrays

by eak (Monk)
on Sep 01, 2000 at 22:19 UTC ( #30771=snippet: print w/replies, xml ) Need Help??
Description: I just thought I would share this little snippet to check whethere arrays are exactly the same. It is very fast compared to looping through each and every element. It assumes that your arrays are in the same order. If they aren't, you would have to sort in both the for both the loop and the join routine, so the join would still be faster. I have included the loop version for those of you that may want to run the benchmark.
Here are my benchmark results: join: 2 wallclock secs ( 1.68 usr + 0.00 sys = 1.68 CPU) loop: 8 wallclock secs ( 7.60 usr + 0.02 sys = 7.62 CPU)
--eak
#!/usr/bin/perl -w
use Benchmark;
 
 
 my @a1 = qw(3 99 4 33 43 98 83 64 3 99 4 33 43 98 83 64 99 4 33 43 98
+ 83 64);
 
 
 my @a2 = qw(3 99 4 33 43 98 83 64 3 99 4 33 43 98 83 64 99 4 33 43 98
+ 83 64);
 
 
sub join_eq {
   return join('',@a1) eq join('', @a2);
}
 
sub loop_eq {
  my $rc = 1;
  if ($#a1 != $#a2) { $rc = 0; }
  else {
    for (my $i=0; $i<=$#a1; $i++) {
      $rc = 0 if $a1[$i] ne $a2[$i];
    }
 }
 return $rc;
}
 
 
timethese(100000, {join => \&join_eq, loop => \&loop_eq});
Replies are listed 'Best First'.
RE: Comparing Arrays
by lhoward (Vicar) on Sep 01, 2000 at 22:41 UTC
    Your join method has a bug. It considers the following 2 lists to be identical:
    @a1=qw(3 5 7); @a2=qw(357);
    It is because your join makes both of those into the string "357" for comparison.
RE: Comparing Arrays
by Boogman (Scribe) on Sep 01, 2000 at 22:45 UTC
    Just figured I'd add another way to do it...
    sub quote_eq { return "@a1" eq "@a2"; }
    And the results were....
    Benchmark: timing 100000 iterations of join, loop, quote... join: 3 wallclock secs ( 2.30 usr + 0.00 sys = 2.30 CPU) @ 43 +402.78/s (n=100000) loop: 17 wallclock secs (16.03 usr + 0.00 sys = 16.03 CPU) @ 62 +37.52/s (n=100000) quote: 5 wallclock secs ( 3.79 usr + 0.00 sys = 3.79 CPU) @ 26 +420.08/s (n=100000)
    join remains in the lead...

    Update: Of course this has the same bug that lhoward pointed out. Assuming lists of single character elements though... :)

    Update 2: Alright, so not the same bug, but a similar bug - as btrott points out... i.e.
    my @a1 = ( "3 5 7", "9 11 13" ); my @a2 = qw( 3 5 7 9 11 13 );
    result in the same string. So this algorithm can be used to compare arrays where the elements don't have whitespace... any array created with the qw operator...
      No, that doesn't have the same bug. The bug in the other code was because the join was done on the empty string, which makes the list (3, 5, 7) collapse to "357". When you quote an array, the implicit join is done on the special character $", which by default is a space. So:
      my @a1 = qw/357/; my @a2 = qw/3 5 7/; print "@a1\n"; print "@a2\n";
      Gives
      357 3 5 7
      Different strings.

      You could change $" to be something weirder when doing the comparison; something less likely to be found in a string normally. Like $;:

      sub areq { local $" = $;; "@{$_[0]}" eq "@{$_[1]}"; }
        Either way it'll still contain a very similar error. If you pick a "separator" you must guarantee that it won't appear in any of the list element in order to prevent this error. This "string" method can still be shown to perform improperly with certain arrays:
        my @a1=('3 5 7'); my @a2=('3','5','7');
RE: Comparing Arrays
by davorg (Chancellor) on Sep 02, 2000 at 00:17 UTC

    Or, you could use the Array::Compare module from CPAN which basically does what you do, but allows you to choose the join character so that you avoid the problems that others have pointed out.

    --
    <http://www.dave.org.uk>

    European Perl Conference - Sept 22/24 2000, ICA, London
    <http://www.yapc.org/Europe/>
RE: Comparing Arrays
by b (Beadle) on Sep 01, 2000 at 23:07 UTC
    The for-loop goes through the whole array while the 'eq' probably short-circuits as soon as possible.
RE: Comparing Arrays
by Adam (Vicar) on Sep 01, 2000 at 23:08 UTC
    Your loop func isn't quite fair. It should be short-circuited:
    sub loop_eq { return 0 if $#a1 != $#a2; for(0..$#a1) { return 0 if $a1[$_] ne $a2[$_] } return 1; }
    Also, try not to write subroutines that use global variables, make them self contained.
    # loop_eq( \@array1, \@array2 ) sub loop_eq { return 0 if @$_[0] != @$_[1]; for(0..$#{$_[0]}) { return 0 if $_[0]->[$_] ne $_[1]->[$_] } return 1; }
RE: Comparing Arrays
by Boogman (Scribe) on Sep 01, 2000 at 23:58 UTC
    Figured I'd fix up the code and do more realistic benchmarks... code and results follow.
    #!/usr/bin/perl -w use Benchmark; @a1 = qw(3 99 4 33 43 98 83 64 3 99 4 33 43 98 83 64 99 4 33 43 98 83 +64); @a2 = qw(3 99 4 33 43 98 83 64 3 99 4 11 43 98 83 64 99 4 33 43 98 83 +64); #middle different @a3 = qw(1 99 4 33 43 98 83 64 3 99 4 11 43 98 83 64 99 4 33 43 98 83 +64); #first different @a4 = qw(1); # different size sub join_eq { return join('',@{$_[0]}) eq join('', @{$_[1]} ); } sub quote_eq { return "@{$_[0]}" eq "@{$_[1]}"; } sub better_quote { if ("@{$_[0]}" eq "@{$_[1]}") { local $" = 'a'; return 1 if ("@{$_[0]}" eq "@{$_[1]}"); } return 0; } sub loop_eq { my ( $a1, $a2 ) = @_; return 0 if ($#$a1 != $#$a2); for ( 0 .. $#a1 ) { return 0 if ( $$a1[$_] ne $$a2[$_] ); } return 1; } print "\nEqual sized strings.\n"; timethese(-2, { join => q{ join_eq( \@a1, \@a1 ) }, quote => q{ quote_eq( \@a1, \@a1 ) }, bquoteSame => q{ better_quote( \@a1, \@a1 ) }, bquoteDiff => q{ better_quote( \@a1, \@a2 ) }, loopSame => q{ loop_eq( \@a1, \@a1 ) }, loopMid => q{ loop_eq( \@a1, \@a2 ) }, loopBeg => q{ loop_eq( \@a1, \@a3 ) } } ); print "\nDifferent sized.\n"; timethese(-2, { join => q{ join_eq( \@a1, \@a4 ) }, quote => q{ quote_eq( \@a1, \@a4 ) }, bquote => q{ quote_eq( \@a1, \@a4 ) }, loop => q{ loop_eq( \@a1, \@a4 ) } } );
    Results...
    Equal sized strings. Benchmark: running bquoteDiff, bquoteSame, join, loopBeg, loopMid, loo +pSame, quote, each for at least 2 CPU seconds... bquoteDiff: 2 wallclock secs ( 2.02 usr + 0.00 sys = 2.02 CPU) @ 21153.73/s (n=42794) bquoteSame: 2 wallclock secs ( 2.05 usr + 0.00 sys = 2.05 CPU) @ 10422.80/s (n=21398) join: 3 wallclock secs ( 2.10 usr + 0.00 sys = 2.10 CPU) @ 31310.51/s (n=65846) loopBeg: 2 wallclock secs ( 2.14 usr + 0.00 sys = 2.14 CPU) @ 30084.42/s (n=64501) loopMid: 2 wallclock secs ( 2.08 usr + 0.00 sys = 2.08 CPU) @ 13698.51/s (n=28534) loopSame: 2 wallclock secs ( 2.09 usr + 0.00 sys = 2.09 CPU) @ 8104.16/s (n=16962) quote: 3 wallclock secs ( 2.13 usr + 0.00 sys = 2.13 CPU) @ 20563.53/s (n=43862) Different sized. Benchmark: running bquote, join, loop, quote, each for at least 2 CPU +seconds... bquote: 2 wallclock secs ( 2.24 usr + 0.00 sys = 2.24 CPU) @ 31813.73/s (n=71390) join: 1 wallclock secs ( 2.13 usr + 0.00 sys = 2.13 CPU) @ 44310.68/s (n=94559) loop: 3 wallclock secs ( 2.00 usr + 0.00 sys = 2.00 CPU) @ 57423.08/s (n=114961) quote: 1 wallclock secs ( 2.11 usr + 0.00 sys = 2.11 CPU) @ 32110.27/s (n=67849)
RE: Comparing Arrays
by fundflow (Chaplain) on Sep 02, 2000 at 02:03 UTC
    I've added a node that generalizes this a bit, check any-all (was inspired by this discussion).
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: snippet [id://30771]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others browsing the Monastery: (5)
As of 2022-01-21 00:11 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    In 2022, my preferred method to securely store passwords is:












    Results (57 votes). Check out past polls.

    Notices?