http://www.perlmonks.org?node_id=1064370


in reply to Re: Best method to diff very large array efficiently
in thread Best method to diff very large array efficiently

Interesting. Here are the results of your benchmark (-Set::Scalar) run using my default perl (5.10.1 64-bit):

C:\test>1064178-b.pl Rate listCompare OPdiff hash_grep OPdiffModified OPdi +ff_undef using_vec listCompare 12.9/s -- -86% -93% -94% + -94% -95% OPdiff 95.1/s 639% -- -48% -52% + -53% -65% hash_grep 185/s 1334% 94% -- -8% + -9% -32% OPdiffModified 200/s 1452% 110% 8% -- + -2% -27% OPdiff_undef 203/s 1478% 114% 10% 2% + -- -26% using_vec 273/s 2019% 187% 48% 37% + 34% --

And this using 5.18 64-bit (also minus List::Compare):

C:\test>\perl5.18\bin\perl 1064178-b.pl Rate OPdiff using_vec hash_grep OPdiffModified OP +diff_undef OPdiff 126/s -- -22% -31% -43% + -44% using_vec 162/s 28% -- -12% -26% + -28% hash_grep 183/s 45% 13% -- -17% + -19% OPdiffModified 220/s 74% 36% 20% -- + -2% OPdiff_undef 225/s 79% 39% 23% 2% + --

They've really screwed up vec. ( Along with substr and a bunch of others :( )


With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

Replies are listed 'Best First'.
Re^3: Best method to diff very large array efficiently
by Kenosis (Priest) on Nov 26, 2013 at 18:15 UTC

    I wouldn't have predicted such a dramatic performance disparity across Perl versions. Have updated my benchmarking post with "(Perl v5.14.2 64-bit)" - which I now think should always be included in benchmarks. Greatly appreciate your informative reply!

      my benchmark was run on v5.10.0 built for i486-linux-gnu-thread-multi and vec was by far the slowest.

      3 differences:

      • I avoided allocating useless arrays for the result
      • Only tested with non-core modules (too lazy to install)
      • LBNL: I tested with random numbers out of 1..1e6 and you took compact intervals!

      Obviously vec scales badly the sparser the distribution of values become...

      IMO not very surprising.

      update

      found bug in benchmark, will correct later. vec still among slowest...

      update

      Thanks to BrowserUk for vividly commenting twice that the benchmark is broken, after I already mentioned that the benchmark is buggy.

      Cheers Rolf

      ( addicted to the Perl Programming Language)

        > found bug in benchmark, will correct later. vec still among slowest

        as promised:

        Integrity of the results are tested in a way that excludes possible sideeffects of benchmark-routines.

        Benchmarks are run with different datasets with shrinking densities ("vec"-approach breaks >1e10)

        Result: "vec" doesn't scale well for larger values...

        Perlversion v5.10.0 Setting values range: 1..1e4 ok 1 - hash_grep ~~ hash_key (3231 entries) ok 2 - hash_key ~~ hash_values (3231 entries) ok 3 - hash_values ~~ using_vec (3231 entries) ok 4 - using_vec ~~ hash_grep (3231 entries) ok 5 - hash_grep ~~ hash_key (3231 entries) ok 6 - hash_key ~~ hash_values (3231 entries) ok 7 - hash_values ~~ using_vec (3231 entries) 1..7 Setting values range: 1..1e4 Rate hash_values hash_key using_vec hash_grep hash_values 29.8/s -- -45% -48% -48% hash_key 53.8/s 81% -- -7% -7% using_vec 57.7/s 93% 7% -- -0% hash_grep 57.7/s 93% 7% 0% -- Setting values range: 1..1e6 Rate hash_values hash_key using_vec hash_grep hash_values 28.5/s -- -24% -24% -25% hash_key 37.3/s 31% -- -0% -2% using_vec 37.3/s 31% 0% -- -2% hash_grep 38.2/s 34% 2% 2% -- Setting values range: 1..1e8 Rate using_vec hash_values hash_grep hash_key using_vec 17.1/s -- -41% -53% -57% hash_values 29.1/s 70% -- -20% -26% hash_grep 36.2/s 112% 24% -- -8% hash_key 39.5/s 131% 36% 9% -- Setting values range: 1..1e9 Rate using_vec hash_values hash_grep hash_key using_vec 3.29/s -- -88% -91% -92% hash_values 28.2/s 759% -- -21% -27% hash_grep 36.0/s 993% 27% -- -7% hash_key 38.8/s 1081% 38% 8% -- Compilation finished at Wed Nov 27 02:08:15
        code:

        Cheers Rolf

        ( addicted to the Perl Programming Language)

        I avoided allocating useless arrays for the result

        Your benchmark is completely unrealistic, chalk & cheese comparison, and thus totally broken.

        1. keys & values used in void context:>

          From perlfunc:

          • "In particular, calling keys() in void context resets the iterator with no other overhead."
          • "(In particular, calling values() in void context resets the iterator with no other overhead.)"
        2. Whilst grep in a void context probably avoid generating the return list; it doesn't stop it from iterating the entire input list.

        Benchmark code that doesn't actually produce the required result is broken. And broken, is just broken; of no value at all.

        Obviously vec scales badly the sparser the distribution of values become...

        Think again.


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

        This just gets funnier and funnier.

        These are the results from kenosis benchmark:

        C:\test>1064178-b.pl Rate OPdiff hash_grep OPdiffModified OPdiff_undef + using_vec OPdiff 90.9/s -- -47% -52% -55% + -64% hash_grep 170/s 87% -- -10% -17% + -33% OPdiffModified 189/s 108% 11% -- -7% + -26% OPdiff_undef 204/s 124% 20% 8% -- + -20% using_vec 254/s 180% 49% 35% 25% + --

        And these from your benchmark:

        C:\test>junk.pl hash_grep: *CMP::hash_grep hash_key_diff: *CMP::hash_key_diff hash_values_diff: *CMP::hash_values_diff using_vec: *CMP::using_vec Rate using_vec hash_values_diff hash_key_diff + hash_grep using_vec 69161/s -- -81% -84% + -88% hash_values_diff 363049/s 425% -- -14% + -39% hash_key_diff 421810/s 510% 16% -- + -29% hash_grep 594454/s 760% 64% 41% + --

        Forget the order of the results, note that all your results are something like 2000 times faster despite that you're using pretty much the same sized arrays as kenosis. Do you have some magic touch that makes computers work so much harder for you?

        Oh look and what your complicated way of building a coderefhash produces:

        my $href = pckg_subs(); pp $href; { hash_grep => sub { "???" }, hash_key_diff => sub { "???" }, hash_values_diff => sub { "???" }, using_vec => sub { "???" }, }

        Like I said. Broken is just broken.


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
      I wouldn't have predicted such a dramatic performance disparity across Perl versions.

      It took me by surprise also.

      I've done, and posted, this hashes versus vec benchmark many times over the years, yours was the first challenge to what I took to be simple fact.

      Its yet another plank in my rapidly growing conclusion that 5.10.1 was 'peak Perl'.

      Have updated my benchmarking post with "(Perl v5.14.2 64-bit)" - which I now think should always be included in benchmarks.

      I wholeheartedly concur and will endeavour to do the same in future.

      Greatly appreciate your informative reply!

      We both learned something. That's win-win. You can't ask for more :)


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.