Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

Re^5: Best method to diff very large array efficiently

by LanX (Canon)
on Nov 27, 2013 at 01:21 UTC ( #1064510=note: print w/ replies, xml ) Need Help??


in reply to Re^4: Best method to diff very large array efficiently
in thread Best method to diff very large array efficiently

> found bug in benchmark, will correct later. vec still among slowest

as promised:

Integrity of the results are tested in a way that excludes possible sideeffects of benchmark-routines.

Benchmarks are run with different datasets with shrinking densities ("vec"-approach breaks >1e10)

Result: "vec" doesn't scale well for larger values...

Perlversion v5.10.0 Setting values range: 1..1e4 ok 1 - hash_grep ~~ hash_key (3231 entries) ok 2 - hash_key ~~ hash_values (3231 entries) ok 3 - hash_values ~~ using_vec (3231 entries) ok 4 - using_vec ~~ hash_grep (3231 entries) ok 5 - hash_grep ~~ hash_key (3231 entries) ok 6 - hash_key ~~ hash_values (3231 entries) ok 7 - hash_values ~~ using_vec (3231 entries) 1..7 Setting values range: 1..1e4 Rate hash_values hash_key using_vec hash_grep hash_values 29.8/s -- -45% -48% -48% hash_key 53.8/s 81% -- -7% -7% using_vec 57.7/s 93% 7% -- -0% hash_grep 57.7/s 93% 7% 0% -- Setting values range: 1..1e6 Rate hash_values hash_key using_vec hash_grep hash_values 28.5/s -- -24% -24% -25% hash_key 37.3/s 31% -- -0% -2% using_vec 37.3/s 31% 0% -- -2% hash_grep 38.2/s 34% 2% 2% -- Setting values range: 1..1e8 Rate using_vec hash_values hash_grep hash_key using_vec 17.1/s -- -41% -53% -57% hash_values 29.1/s 70% -- -20% -26% hash_grep 36.2/s 112% 24% -- -8% hash_key 39.5/s 131% 36% 9% -- Setting values range: 1..1e9 Rate using_vec hash_values hash_grep hash_key using_vec 3.29/s -- -88% -91% -92% hash_values 28.2/s 759% -- -21% -27% hash_grep 36.0/s 993% 27% -- -7% hash_key 38.8/s 1081% 38% 8% -- Compilation finished at Wed Nov 27 02:08:15
code:
use strict; use warnings; use Benchmark qw/cmpthese/; use Data::Dump qw/pp/; use feature 'say'; use Test::More; say "Perlversion $^V"; my (@arr_1,@arr_2); sub init_data{ my $range_limit=shift; say "\n\n Setting values range: 1..$range_limit\n"; my %unique; $unique{int rand $range_limit}=undef while keys %unique <8000; @arr_1=keys %unique; %unique=(); $unique{int rand $range_limit}=undef while keys %unique <6000; @arr_2=keys %unique; } { package CMP; my @res= ("") x 8000; sub hash_values_diff { my %diff3; @diff3{@arr_1} = @arr_1; delete @diff3{@arr_2}; @res = values %diff3 ; } sub hash_key_diff { my %diff3; undef @diff3{@arr_1}; delete @diff3{@arr_2}; @res = keys %diff3 ; } sub using_vec_diff { my $vec = ''; vec( $vec, $_, 1 ) = 1 for @arr_2; @res = grep !vec( $vec, $_, 1 ), @arr_1; } sub hash_grep_diff { my %arr_2_hash; undef @arr_2_hash{@arr_2}; @res = grep !exists $arr_2_hash{$_}, @arr_1; } } #--- Test benchmarks for correctness init_data("1e4"); test_subs('_diff$',"CMP"); #--- Compare benchmarks foreach my $range (qw/1e4 1e6 1e8 1e9/) { init_data($range); cmpthese(-5, pckg_subs('_diff$',"CMP") ); } #====== Utility functions for smarter benchmarks sub pckg_subs { my $filter = shift // qr[.]; my $pckg_name = shift // "CMP"; my $stash = do { no strict 'refs'; \ %{ "${pckg_name}::" }; }; # filter all subs from package my $codehashref; while (my ($name,$glob)= each %$stash) { if ( my $cref = *{$glob}{CODE} and $name =~ s/$filter// ) { # print "$name:\t$glob\n"; $codehashref->{$name}=$cref; } } return $codehashref; } sub test_subs { my $h_subs=pckg_subs(@_); my ($last_name,@last_res,$name,$code,@res); # compair pairwise, loop twice to exclude sideeffects for (1..2) { while (($name,$code) =each %$h_subs ) { @res = sort $code->(); is_deeply(\@last_res,\@res,"$last_name ~~ $name (". scalar @res +. " entries)") if $last_name; @last_res = @res; $last_name = $name; } } done_testing( ); }

Cheers Rolf

( addicted to the Perl Programming Language)


Comment on Re^5: Best method to diff very large array efficiently
Select or Download Code

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1064510]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others meditating upon the Monastery: (13)
As of 2014-12-29 16:15 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (193 votes), past polls