Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

Re: Find unique elements in an array

by kvale (Monsignor)
on Apr 04, 2004 at 07:52 UTC ( #342435=note: print w/replies, xml ) Need Help??


in reply to Find unique elements in an array

Here is a faster solution, using a hash slice:
use Benchmark qw(:all) ; my @a; push @a, int (rand(10)) foreach 1..100; my %unique; my @awd; cmpthese($1000, { 'jc' => sub { foreach my $thingy (@a) { $unique{$thingy} = 1 +; } @awd = keys %unique; }, 'mk' => sub { @unique{ @a} = 1; @awd = keys %unique; }, });
The results are
Benchmark: running jc, mk for at least 3 CPU seconds... jc: 3 wallclock secs ( 3.14 usr + 0.00 sys = 3.14 CPU) @ 53 +5.67/s (n=20522) mk: 4 wallclock secs ( 3.27 usr + 0.00 sys = 3.27 CPU) @ 14 +937.00/s (n=48844) Rate jc mk jc 6536/s -- -56% mk 14937/s 129% --
Update: $1000 is a typo should be 1000. I'll keep it as is for comparisons with the following nodes. Thanks BrowserUK and tilly!

-Mark

Replies are listed 'Best First'.
Re: Re: Find unique elements in an array
by tilly (Archbishop) on Apr 04, 2004 at 20:25 UTC
    Calling undef on a hash slice is faster still.
    use Benchmark qw(:all) ; my @a; push @a, int (rand(10)) foreach 1..100; my %unique; my @awd; cmpthese($1000, { 'jc' => sub { foreach my $thingy (@a) { $unique{$thingy} = 1 +; } @awd = keys %unique; }, 'mk' => sub { @unique{ @a} = 1; @awd = keys %unique; }, 'bt' => sub { undef(@unique{@a}); @awd = keys %unique; }, });
    gives me
    Rate jc mk bt jc 16556/s -- -54% -62% mk 36080/s 118% -- -18% bt 44007/s 166% 22% --
    (Incidentally that is a truely bizarre indent style.)

      What is the purpose of using undef as the iteration count for cmpthese( $1000, ... );?

      Is that what causes the warnings?

      Use of uninitialized value in numeric gt (>) at c:/Perl/lib/Benchmark. +pm line 831. Use of uninitialized value in numeric gt (>) at c:/Perl/lib/Benchmark. +pm line 838. Use of uninitialized value in numeric eq (==) at c:/Perl/lib/Benchmark +.pm line 776. Use of uninitialized value in numeric gt (>) at c:/Perl/lib/Benchmark. +pm line 840. Use of uninitialized value in numeric gt (>) at c:/Perl/lib/Benchmark. +pm line 791. Use of uninitialized value in numeric eq (==) at c:/Perl/lib/Benchmark +.pm line 776. Use of uninitialized value in numeric gt (>) at c:/Perl/lib/Benchmark. +pm line 791. Use of uninitialized value in numeric eq (==) at c:/Perl/lib/Benchmark +.pm line 776. Use of uninitialized value in numeric gt (>) at c:/Perl/lib/Benchmark. +pm line 791. Use of uninitialized value in numeric eq (==) at c:/Perl/lib/Benchmark +.pm line 776.

      Examine what is said, not who speaks.
      "Efficiency is intelligent laziness." -David Dunham
      "Think for yourself!" - Abigail
        You'll have to ask kvale that question, not me. I copied everything except the last test from him.

        I'd guessed that he meant to type "1000" and messed up. I noticed wondered about that as I was copying it back, but didn't care enough to modify the code and re-run it...

Re: Re: Find unique elements in an array
by ysth (Canon) on Apr 04, 2004 at 13:35 UTC
    saying = () instead of = 1 is slightly faster, but the hash slice approach may not scale well with large @a's, since the whole array needs to be placed on the stack at once.
      If there is a stack penalty, it is not terrible, as the routine get faster with large arrays:
      use Benchmark qw(:all) ; my @a; push @a, int (rand(100)) foreach 1..2_000_000; my %unique; my (@awd1, @awd2, @awd3); cmpthese(5, { 'jc' => sub { foreach my $thingy (@a) { $unique{$thingy} = 1 +; } @awd1 = keys %unique; }, 'mk' => sub { @unique{ @a} = 1; @awd2 = keys %unique; }, 'ys' => sub { @unique{ @a} = (); @awd3 = keys %unique; }, });
      yields
      Benchmark: timing 5 iterations of jc, mk, ys... jc: 19 wallclock secs (16.75 usr + 0.35 sys = 17.10 CPU) @ 0 +.29/s (n=5) mk: 6 wallclock secs ( 6.00 usr + 0.01 sys = 6.01 CPU) @ 0 +.83/s (n=5) ys: 7 wallclock secs ( 6.00 usr + 0.00 sys = 6.00 CPU) @ 0 +.83/s (n=5) s/iter jc mk ys jc 3.42 -- -65% -65% mk 1.20 185% -- -0% ys 1.20 185% 0% --
      The = () optimization does not seem to make much difference.

      -Mark

        With your original benchmark, I saw a consistent 4-5% increase for (). Obviously this is a constant difference that disappears into the woodwork with larger slices.
        Anyway @unique{@a} = 1; looks a bit curious. Why do we set to 1 the only value in a huge hash?
Re: Re: Find unique elements in an array
by kappa (Chaplain) on Apr 04, 2004 at 21:04 UTC
    As per ccn's comment below, @unique{@a} = 1 looks weird and effectively equals to this: $unique{$a[0]} = 1; undef(@unique{@a[1..$#a]});, which, BTW, is good, as, according to tilly, storing lots of undefs instead of 1's in a hash to create keys is a Good Thing(tm) for performance.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://342435]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others perusing the Monastery: (6)
As of 2020-02-17 10:47 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    What numbers are you going to focus on primarily in 2020?










    Results (71 votes). Check out past polls.

    Notices?