Re^2: Removing elemets from an array

in reply to Re: Removing elemets from an array
in thread Removing elemets from an array

Update: Thanks to linuxer for pointing out that the original data were modified by the functions and therefore after the first call the other functions had a smaller data set. Here is the corrected version and the results.

Updated again: realized I had warnings turned on and the nithins code warnings slowed down the benchmark a bunch when they printed to STDERR. Fixed the nithins approach.

As I've seen BrowserUK do before I replaced the toy data set with something more realistic and the nithins approach became the slowest.

I started with 10^6 and then 10^5 elements in the range but got warnings about too few iterations to make a comparison.

#!/usr/bin/perl                                                       
+                         
use Benchmark qw ( :hireswallclock cmpthese timethese );
use strict;
# use warnings;
use Set::Scalar;
use Data::Dumper;
use List::Compare;

# our @arcb = ( 450, 625, 720, 645 );
# our @arca = ( 625, 645 );

our @arcb = grep {$_ % 2}     (1..10000);
our @arca = grep {not $_ % 5} (1..10000);

sub davido {
    #our @arcb;
    #our @arca;
    local @arcb = @arcb;
    local @arca = @arca;

    @arcb = do {
        my %exclude;
        @exclude{@arca} = ();
        grep { !exists $exclude{$_} } @arcb;
    };
    # print "@arcb\n";
}

sub karlgoethebier {
    #our @arcb;
    #our @arca;
    local @arcb = @arcb;
    local @arca = @arca;
    my $s1   = Set::Scalar->new(@arcb);
    my $s2   = Set::Scalar->new(@arca);
    @arcb = @{ $s1 - $s2 };
    # print Dumper( \@arcb );
}

sub Lotus1 {
    #our @arcb;
    #our @arca;
    local @arcb = @arcb;
    local @arca = @arca;
    my $lc = List::Compare->new( \@arca, \@arcb );
    @arcb = $lc->get_Ronly;
    # print "arcb: @arcb\n";
}

sub LanX {
    my ( %arcb, %arca );
    #our @arcb;
    #our @arca;
    local @arcb = @arcb;
    local @arca = @arca;
    @arcb{@arcb} = ();
    # @arca{@arca} = ();  update here
    delete @arcb{@arca};
    @arcb = keys %arcb;
    # print Dumper( \@arcb );
}

sub linuxer {
    #our @arcb;
    #our @arca;
    local @arcb = @arcb;
    local @arca = @arca;
    my %unwanted; 
    $unwanted{$_}++ for @arca;
    @arcb = grep { !$unwanted{$_} } @arcb;
    # print "@arcb\n";
}

sub nithins {
    #our @arcb;
    #our @arca;
    local @arcb = @arcb;
    local @arca = @arca;
    foreach my $a (@arca) {
        for ( my $i = 0 ; $i < scalar(@arcb) ; $i++ ) {
            #delete $arcb[$i] if ( $a == $arcb[$i] );
            #fixed the warnings by using splice
            splice @arcb,$i,1 if ( $a == $arcb[$i] );
        }
    }
    # print "@arcb";
}

# karlgoethebier;
# Lotus1;
# LanX;
# davido;
# linuxer;
# nithins;

my $results = timethese(
    -20,
    {
        'karlgoethebier' => 'karlgoethebier',
        'Lotus1'         => 'Lotus1',
        'LanX'           => 'LanX',
        'davido'         => 'davido',
        'linuxer'        => 'linuxer',
        'nithins'        => 'nithins',
    }
);

cmpthese($results);


__END__

Benchmark: running LanX, Lotus1, davido, karlgoethebier, linuxer, nith
+ins for at least 20 CPU seconds...
      LanX: 21.5303 wallclock secs (21.14 usr +  0.38 sys = 21.52 CPU)
+ @ 143.25/s (n=3082)
    Lotus1: 20.4152 wallclock secs (20.27 usr +  0.14 sys = 20.41 CPU)
+ @ 24.55/s (n=501)
    davido: 20.1223 wallclock secs (19.72 usr +  0.38 sys = 20.09 CPU)
+ @ 110.78/s (n=2226)
karlgoethebier: 22.1035 wallclock secs (22.06 usr +  0.03 sys = 22.09 
+CPU) @ 22.22/s (n=491)
   linuxer: 21.5713 wallclock secs (21.14 usr +  0.42 sys = 21.56 CPU)
+ @ 147.39/s (n=3178)
   nithins: 21.1337 wallclock secs (21.09 usr +  0.00 sys = 21.09 CPU)
+ @  0.66/s (n=14)
                  Rate nithins karlgoethebier   Lotus1   davido     La
+nX linuxer
nithins        0.664/s      --           -97%     -97%     -99%    -10
+0%   -100%
karlgoethebier  22.2/s   3248%             --      -9%     -80%     -8
+4%    -85%
Lotus1          24.6/s   3599%            10%       --     -78%     -8
+3%    -83%
davido           111/s  16590%           398%     351%       --     -2
+3%    -25%
LanX             143/s  21482%           545%     483%      29%       
+--     -3%
linuxer          147/s  22106%           563%     500%      33%       
+3%      --


Note: the results below are the original version with the modified dat
+a sets from the commented out 'our' array declarations.

Benchmark: running LanX, Lotus1, davido, karlgoethebier, linuxer, nith
+ins for at least 20 CPU seconds...
      LanX: 24.0026 wallclock secs (22.73 usr +  0.00 sys = 22.73 CPU)
+ @ 608.25/s (n=13828)
    Lotus1: 22.2431 wallclock secs (21.50 usr +  0.30 sys = 21.80 CPU)
+ @ 27.53/s (n=600)
    davido: 20.17 wallclock secs (19.81 usr +  0.33 sys = 20.14 CPU) @
+ 199.25/s (n=4013)
karlgoethebier: 20.2918 wallclock secs (19.95 usr +  0.33 sys = 20.28 
+CPU) @ 14.35/s (n=291)
   linuxer: 20.1558 wallclock secs (19.87 usr +  0.26 sys = 20.14 CPU)
+ @ 321.91/s (n=6483)
   nithins: 20.9194 wallclock secs (20.89 usr +  0.00 sys = 20.89 CPU)
+ @  0.77/s (n=16)
                  Rate nithins karlgoethebier   Lotus1   davido linuxe
+r     LanX
nithins        0.766/s      --           -95%     -97%    -100%   -100
+%    -100%
karlgoethebier  14.3/s   1773%             --     -48%     -93%    -96
+%     -98%
Lotus1          27.5/s   3494%            92%       --     -86%    -91
+%     -95%
davido           199/s  25915%          1289%     624%       --    -38
+%     -67%
linuxer          322/s  41932%          2144%    1069%      62%      -
+-     -47%
LanX             608/s  79319%          4139%    2110%     205%     89
+%       --
[download]

updated with LanX's update

Comment on Re^2: Removing elemets from an array Download Code

Replies are listed 'Best First'.
Re^3: Removing elemets from an array by karlgoethebier (Abbot) on Dec 30, 2012 at 11:58 UTC
This is very interesting. But for the moment i don't have any idea why it is so. Perhaps some of the involved brothers would like to comment/interprete the result so we can learn something new? Best regards, Karl ŤThe Crux of the Biscuit is the Apostropheť	[reply]
Re^4: Removing elemets from an array (optimization) by LanX (Saint) on Dec 30, 2012 at 17:15 UTC
Every operation in Perl - like any abstraction - has an overhead which can't be compensated by its C-implementation. So tending to use as few as possible perl-commands which do mass-operations means to reduce overhead and delaying the task to highly optimized C. Loops (including maps) are just multiplying the amount of executed commands (just imagine the linearized alternative which is even faster as the loop...) so my approach is the fastest because its basically reduced to only 3 perl commandsš 1. setting a hash 2. deleting a slice from that hash 3. reading the resulting hash OTOH my approach has drawbacks, depending on the task, it's only suitable for real sets of strings. Arrays can contain repeated data or other datatypes like refs. EDIT: you might be interested in Using hashes for set operations... Cheers Rolf PS: of course there are still loops working under the hood, but they are already optimized in C.	[reply]
Re^5: Removing elemets from an array (optimization) by karlgoethebier (Abbot) on Dec 31, 2012 at 18:00 UTC
ThanX - and especially for the link HNY and best regards, Karl ŤThe Crux of the Biscuit is the Apostropheť	[reply]
Re^4: Removing elemets from an array by Lotus1 (Vicar) on Dec 30, 2012 at 16:44 UTC
Thanks for setting up the benchmark in the first place, it has been fun seeing the performance of the different approaches. Nested loops don't scale well. When there were only 2 items for the outer loop it was really fast. I've seen folks here toss around the Big O notation and although I hadn't heard of it before Perlmonks I figured out they were talking about linear growth as opposed to exponential or some other function for the execution time related to the size of the data set. The nested loop approach seems to be O(n^2) but I'm still learning so hopefully someone will weigh in if I'm off base.	[reply]
Re^5: Removing elemets from an array by karlgoethebier (Abbot) on Dec 31, 2012 at 18:02 UTC
Thank you and HNY. Best regards, Karl ŤThe Crux of the Biscuit is the Apostropheť	[reply]

In Section Seekers of Perl Wisdom