http://www.perlmonks.org?node_id=342467


in reply to Re: Find unique elements in an array
in thread Find unique elements in an array

saying = () instead of = 1 is slightly faster, but the hash slice approach may not scale well with large @a's, since the whole array needs to be placed on the stack at once.

Replies are listed 'Best First'.
Re: Re: Re: Find unique elements in an array
by kvale (Monsignor) on Apr 04, 2004 at 15:52 UTC
    If there is a stack penalty, it is not terrible, as the routine get faster with large arrays:
    use Benchmark qw(:all) ; my @a; push @a, int (rand(100)) foreach 1..2_000_000; my %unique; my (@awd1, @awd2, @awd3); cmpthese(5, { 'jc' => sub { foreach my $thingy (@a) { $unique{$thingy} = 1 +; } @awd1 = keys %unique; }, 'mk' => sub { @unique{ @a} = 1; @awd2 = keys %unique; }, 'ys' => sub { @unique{ @a} = (); @awd3 = keys %unique; }, });
    yields
    Benchmark: timing 5 iterations of jc, mk, ys... jc: 19 wallclock secs (16.75 usr + 0.35 sys = 17.10 CPU) @ 0 +.29/s (n=5) mk: 6 wallclock secs ( 6.00 usr + 0.01 sys = 6.01 CPU) @ 0 +.83/s (n=5) ys: 7 wallclock secs ( 6.00 usr + 0.00 sys = 6.00 CPU) @ 0 +.83/s (n=5) s/iter jc mk ys jc 3.42 -- -65% -65% mk 1.20 185% -- -0% ys 1.20 185% 0% --
    The = () optimization does not seem to make much difference.

    -Mark

      With your original benchmark, I saw a consistent 4-5% increase for (). Obviously this is a constant difference that disappears into the woodwork with larger slices.
      Anyway @unique{@a} = 1; looks a bit curious. Why do we set to 1 the only value in a huge hash?