http://www.perlmonks.org?node_id=1010835


in reply to Re: Removing elemets from an array
in thread Removing elemets from an array

Update: Thanks to linuxer for pointing out that the original data were modified by the functions and therefore after the first call the other functions had a smaller data set. Here is the corrected version and the results.

Updated again: realized I had warnings turned on and the nithins code warnings slowed down the benchmark a bunch when they printed to STDERR. Fixed the nithins approach.

As I've seen BrowserUK do before I replaced the toy data set with something more realistic and the nithins approach became the slowest.

I started with 10^6 and then 10^5 elements in the range but got warnings about too few iterations to make a comparison.

#!/usr/bin/perl + use Benchmark qw ( :hireswallclock cmpthese timethese ); use strict; # use warnings; use Set::Scalar; use Data::Dumper; use List::Compare; # our @arcb = ( 450, 625, 720, 645 ); # our @arca = ( 625, 645 ); our @arcb = grep {$_ % 2} (1..10000); our @arca = grep {not $_ % 5} (1..10000); sub davido { #our @arcb; #our @arca; local @arcb = @arcb; local @arca = @arca; @arcb = do { my %exclude; @exclude{@arca} = (); grep { !exists $exclude{$_} } @arcb; }; # print "@arcb\n"; } sub karlgoethebier { #our @arcb; #our @arca; local @arcb = @arcb; local @arca = @arca; my $s1 = Set::Scalar->new(@arcb); my $s2 = Set::Scalar->new(@arca); @arcb = @{ $s1 - $s2 }; # print Dumper( \@arcb ); } sub Lotus1 { #our @arcb; #our @arca; local @arcb = @arcb; local @arca = @arca; my $lc = List::Compare->new( \@arca, \@arcb ); @arcb = $lc->get_Ronly; # print "arcb: @arcb\n"; } sub LanX { my ( %arcb, %arca ); #our @arcb; #our @arca; local @arcb = @arcb; local @arca = @arca; @arcb{@arcb} = (); # @arca{@arca} = (); update here delete @arcb{@arca}; @arcb = keys %arcb; # print Dumper( \@arcb ); } sub linuxer { #our @arcb; #our @arca; local @arcb = @arcb; local @arca = @arca; my %unwanted; $unwanted{$_}++ for @arca; @arcb = grep { !$unwanted{$_} } @arcb; # print "@arcb\n"; } sub nithins { #our @arcb; #our @arca; local @arcb = @arcb; local @arca = @arca; foreach my $a (@arca) { for ( my $i = 0 ; $i < scalar(@arcb) ; $i++ ) { #delete $arcb[$i] if ( $a == $arcb[$i] ); #fixed the warnings by using splice splice @arcb,$i,1 if ( $a == $arcb[$i] ); } } # print "@arcb"; } # karlgoethebier; # Lotus1; # LanX; # davido; # linuxer; # nithins; my $results = timethese( -20, { 'karlgoethebier' => 'karlgoethebier', 'Lotus1' => 'Lotus1', 'LanX' => 'LanX', 'davido' => 'davido', 'linuxer' => 'linuxer', 'nithins' => 'nithins', } ); cmpthese($results); __END__ Benchmark: running LanX, Lotus1, davido, karlgoethebier, linuxer, nith +ins for at least 20 CPU seconds... LanX: 21.5303 wallclock secs (21.14 usr + 0.38 sys = 21.52 CPU) + @ 143.25/s (n=3082) Lotus1: 20.4152 wallclock secs (20.27 usr + 0.14 sys = 20.41 CPU) + @ 24.55/s (n=501) davido: 20.1223 wallclock secs (19.72 usr + 0.38 sys = 20.09 CPU) + @ 110.78/s (n=2226) karlgoethebier: 22.1035 wallclock secs (22.06 usr + 0.03 sys = 22.09 +CPU) @ 22.22/s (n=491) linuxer: 21.5713 wallclock secs (21.14 usr + 0.42 sys = 21.56 CPU) + @ 147.39/s (n=3178) nithins: 21.1337 wallclock secs (21.09 usr + 0.00 sys = 21.09 CPU) + @ 0.66/s (n=14) Rate nithins karlgoethebier Lotus1 davido La +nX linuxer nithins 0.664/s -- -97% -97% -99% -10 +0% -100% karlgoethebier 22.2/s 3248% -- -9% -80% -8 +4% -85% Lotus1 24.6/s 3599% 10% -- -78% -8 +3% -83% davido 111/s 16590% 398% 351% -- -2 +3% -25% LanX 143/s 21482% 545% 483% 29% +-- -3% linuxer 147/s 22106% 563% 500% 33% +3% -- Note: the results below are the original version with the modified dat +a sets from the commented out 'our' array declarations. Benchmark: running LanX, Lotus1, davido, karlgoethebier, linuxer, nith +ins for at least 20 CPU seconds... LanX: 24.0026 wallclock secs (22.73 usr + 0.00 sys = 22.73 CPU) + @ 608.25/s (n=13828) Lotus1: 22.2431 wallclock secs (21.50 usr + 0.30 sys = 21.80 CPU) + @ 27.53/s (n=600) davido: 20.17 wallclock secs (19.81 usr + 0.33 sys = 20.14 CPU) @ + 199.25/s (n=4013) karlgoethebier: 20.2918 wallclock secs (19.95 usr + 0.33 sys = 20.28 +CPU) @ 14.35/s (n=291) linuxer: 20.1558 wallclock secs (19.87 usr + 0.26 sys = 20.14 CPU) + @ 321.91/s (n=6483) nithins: 20.9194 wallclock secs (20.89 usr + 0.00 sys = 20.89 CPU) + @ 0.77/s (n=16) Rate nithins karlgoethebier Lotus1 davido linuxe +r LanX nithins 0.766/s -- -95% -97% -100% -100 +% -100% karlgoethebier 14.3/s 1773% -- -48% -93% -96 +% -98% Lotus1 27.5/s 3494% 92% -- -86% -91 +% -95% davido 199/s 25915% 1289% 624% -- -38 +% -67% linuxer 322/s 41932% 2144% 1069% 62% - +- -47% LanX 608/s 79319% 4139% 2110% 205% 89 +% --

updated with LanX's update

Replies are listed 'Best First'.
Re^3: Removing elemets from an array
by karlgoethebier (Abbot) on Dec 30, 2012 at 11:58 UTC

    This is very interesting. But for the moment i don't have any idea why it is so.

    Perhaps some of the involved brothers would like to comment/interprete the result so we can learn something new?

    Best regards, Karl

    «The Crux of the Biscuit is the Apostrophe»

      Every operation in Perl - like any abstraction - has an overhead which can't be compensated by its C-implementation.

      So tending to use as few as possible perl-commands which do mass-operations means to reduce overhead and delaying the task to highly optimized C.

      Loops (including maps) are just multiplying the amount of executed commands (just imagine the linearized alternative which is even faster as the loop...)

      so my approach is the fastest because its basically reduced to only 3 perl commandsą

      1. setting a hash

      2. deleting a slice from that hash

      3. reading the resulting hash

      OTOH my approach has drawbacks, depending on the task, it's only suitable for real sets of strings.

      Arrays can contain repeated data or other datatypes like refs.

      EDIT: you might be interested in Using hashes for set operations...

      Cheers Rolf

      PS: of course there are still loops working under the hood, but they are already optimized in C.

        ThanX - and especially for the link

        HNY and best regards, Karl

        «The Crux of the Biscuit is the Apostrophe»

      Thanks for setting up the benchmark in the first place, it has been fun seeing the performance of the different approaches.

      Nested loops don't scale well. When there were only 2 items for the outer loop it was really fast. I've seen folks here toss around the Big O notation and although I hadn't heard of it before Perlmonks I figured out they were talking about linear growth as opposed to exponential or some other function for the execution time related to the size of the data set.

      The nested loop approach seems to be O(n^2) but I'm still learning so hopefully someone will weigh in if I'm off base.

        Thank you and HNY.

        Best regards, Karl

        «The Crux of the Biscuit is the Apostrophe»