Re^2: Best method to diff very large array efficiently

in reply to Re: Best method to diff very large array efficiently
in thread Best method to diff very large array efficiently

Interesting. Here are the results of your benchmark (-Set::Scalar) run using my default perl (5.10.1 64-bit):

C:\test>1064178-b.pl
                 Rate listCompare OPdiff hash_grep OPdiffModified OPdi
+ff_undef using_vec
listCompare    12.9/s          --   -86%      -93%           -94%     
+    -94%      -95%
OPdiff         95.1/s        639%     --      -48%           -52%     
+    -53%      -65%
hash_grep       185/s       1334%    94%        --            -8%     
+     -9%      -32%
OPdiffModified  200/s       1452%   110%        8%             --     
+     -2%      -27%
OPdiff_undef    203/s       1478%   114%       10%             2%     
+      --      -26%
using_vec       273/s       2019%   187%       48%            37%     
+     34%        --
[download]

And this using 5.18 64-bit (also minus List::Compare):

C:\test>\perl5.18\bin\perl 1064178-b.pl
                Rate     OPdiff  using_vec hash_grep OPdiffModified OP
+diff_undef
OPdiff         126/s         --       -22%      -31%           -43%   
+      -44%
using_vec      162/s        28%         --      -12%           -26%   
+      -28%
hash_grep      183/s        45%        13%        --           -17%   
+      -19%
OPdiffModified 220/s        74%        36%       20%             --   
+       -2%
OPdiff_undef   225/s        79%        39%       23%             2%   
+        --
[download]

They've really screwed up vec. ( Along with substr and a bunch of others :( )

With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.

"Science is about questioning the status quo. Questioning authority".

In the absence of evidence, opinion is indistinguishable from prejudice.

Comment on Re^2: Best method to diff very large array efficiently Select or Download Code

Replies are listed 'Best First'.
Re^3: Best method to diff very large array efficiently by Kenosis (Priest) on Nov 26, 2013 at 18:15 UTC
I wouldn't have predicted such a dramatic performance disparity across Perl versions. Have updated my benchmarking post with "(Perl v5.14.2 64-bit)" - which I now think should always be included in benchmarks. Greatly appreciate your informative reply!	[reply]
Re^4: Best method to diff very large array efficiently by LanX (Saint) on Nov 26, 2013 at 18:53 UTC
my benchmark was run on `v5.10.0 built for i486-linux-gnu-thread-multi` and `vec` was by far the slowest. 3 differences: I avoided allocating useless arrays for the result Only tested with non-core modules (too lazy to install) LBNL: I tested with random numbers out of 1..1e6 and you took compact intervals! Obviously `vec` scales badly the sparser the distribution of values become... IMO not very surprising. update found bug in benchmark, will correct later. vec still among slowest... update Thanks to BrowserUk for vividly commenting twice that the benchmark is broken, after I already mentioned that the benchmark is buggy. Cheers Rolf ( addicted to the Perl Programming Language)	[reply] [d/l] [select]
Re^5: Best method to diff very large array efficiently by LanX (Saint) on Nov 27, 2013 at 01:21 UTC
> found bug in benchmark, will correct later. vec still among slowest as promised: Integrity of the results are tested in a way that excludes possible sideeffects of benchmark-routines. Benchmarks are run with different datasets with shrinking densities ("vec"-approach breaks >1e10) Result: "vec" doesn't scale well for larger values... Perlversion v5.10.0 Setting values range: 1..1e4 ok 1 - hash_grep ~~ hash_key (3231 entries) ok 2 - hash_key ~~ hash_values (3231 entries) ok 3 - hash_values ~~ using_vec (3231 entries) ok 4 - using_vec ~~ hash_grep (3231 entries) ok 5 - hash_grep ~~ hash_key (3231 entries) ok 6 - hash_key ~~ hash_values (3231 entries) ok 7 - hash_values ~~ using_vec (3231 entries) 1..7 Setting values range: 1..1e4 Rate hash_values hash_key using_vec hash_grep hash_values 29.8/s -- -45% -48% -48% hash_key 53.8/s 81% -- -7% -7% using_vec 57.7/s 93% 7% -- -0% hash_grep 57.7/s 93% 7% 0% -- Setting values range: 1..1e6 Rate hash_values hash_key using_vec hash_grep hash_values 28.5/s -- -24% -24% -25% hash_key 37.3/s 31% -- -0% -2% using_vec 37.3/s 31% 0% -- -2% hash_grep 38.2/s 34% 2% 2% -- Setting values range: 1..1e8 Rate using_vec hash_values hash_grep hash_key using_vec 17.1/s -- -41% -53% -57% hash_values 29.1/s 70% -- -20% -26% hash_grep 36.2/s 112% 24% -- -8% hash_key 39.5/s 131% 36% 9% -- Setting values range: 1..1e9 Rate using_vec hash_values hash_grep hash_key using_vec 3.29/s -- -88% -91% -92% hash_values 28.2/s 759% -- -21% -27% hash_grep 36.0/s 993% 27% -- -7% hash_key 38.8/s 1081% 38% 8% -- Compilation finished at Wed Nov 27 02:08:15 [download] code: Read more... (3 kB) Cheers Rolf ( addicted to the Perl Programming Language)	[reply] [d/l] [select]
Re^5: Best method to diff very large array efficiently by BrowserUk (Patriarch) on Nov 26, 2013 at 20:00 UTC
I avoided allocating useless arrays for the result Your benchmark is completely unrealistic, chalk & cheese comparison, and thus totally broken. keys & values used in void context:> From perlfunc: "In particular, calling keys() in void context resets the iterator with no other overhead." "(In particular, calling values() in void context resets the iterator with no other overhead.)" Whilst grep in a void context probably avoid generating the return list; it doesn't stop it from iterating the entire input list. Benchmark code that doesn't actually produce the required result is broken. And broken, is just broken; of no value at all. Obviously vec scales badly the sparser the distribution of values become... Think again. With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply] [d/l]
Re^5: Best method to diff very large array efficiently by BrowserUk (Patriarch) on Nov 26, 2013 at 20:34 UTC
This just gets funnier and funnier. These are the results from kenosis benchmark: `C:\test>1064178-b.pl Rate OPdiff hash_grep OPdiffModified OPdiff_undef + using_vec OPdiff 90.9/s -- -47% -52% -55% + -64% hash_grep 170/s 87% -- -10% -17% + -33% OPdiffModified 189/s 108% 11% -- -7% + -26% OPdiff_undef 204/s 124% 20% 8% -- + -20% using_vec 254/s 180% 49% 35% 25% + --` [download] And these from your benchmark: `C:\test>junk.pl hash_grep: CMP::hash_grep hash_key_diff: CMP::hash_key_diff hash_values_diff: CMP::hash_values_diff using_vec: CMP::using_vec Rate using_vec hash_values_diff hash_key_diff + hash_grep using_vec 69161/s -- -81% -84% + -88% hash_values_diff 363049/s 425% -- -14% + -39% hash_key_diff 421810/s 510% 16% -- + -29% hash_grep 594454/s 760% 64% 41% + --` [download] Forget the order of the results, note that all your results are something like 2000 times faster despite that you're using pretty much the same sized arrays as kenosis. Do you have some magic touch that makes computers work so much harder for you? Oh look and what your complicated way of building a coderefhash produces: `my $href = pckg_subs(); pp $href; { hash_grep => sub { "???" }, hash_key_diff => sub { "???" }, hash_values_diff => sub { "???" }, using_vec => sub { "???" }, }` [download] Like I said. Broken is just broken. With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply] [d/l] [select]
Re^4: Best method to diff very large array efficiently by BrowserUk (Patriarch) on Nov 26, 2013 at 19:49 UTC
I wouldn't have predicted such a dramatic performance disparity across Perl versions. It took me by surprise also. I've done, and posted, this hashes versus vec benchmark many times over the years, yours was the first challenge to what I took to be simple fact. Its yet another plank in my rapidly growing conclusion that 5.10.1 was 'peak Perl'. Have updated my benchmarking post with "(Perl v5.14.2 64-bit)" - which I now think should always be included in benchmarks. I wholeheartedly concur and will endeavour to do the same in future. Greatly appreciate your informative reply! We both learned something. That's win-win. You can't ask for more :) With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply]

In Section Seekers of Perl Wisdom

update

update