Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

Re^3: Best method to diff very large array efficiently

by Kenosis (Priest)
on Nov 26, 2013 at 18:15 UTC ( [id://1064440]=note: print w/replies, xml ) Need Help??


in reply to Re^2: Best method to diff very large array efficiently
in thread Best method to diff very large array efficiently

I wouldn't have predicted such a dramatic performance disparity across Perl versions. Have updated my benchmarking post with "(Perl v5.14.2 64-bit)" - which I now think should always be included in benchmarks. Greatly appreciate your informative reply!

  • Comment on Re^3: Best method to diff very large array efficiently

Replies are listed 'Best First'.
Re^4: Best method to diff very large array efficiently
by LanX (Saint) on Nov 26, 2013 at 18:53 UTC
    my benchmark was run on v5.10.0 built for i486-linux-gnu-thread-multi and vec was by far the slowest.

    3 differences:

    • I avoided allocating useless arrays for the result
    • Only tested with non-core modules (too lazy to install)
    • LBNL: I tested with random numbers out of 1..1e6 and you took compact intervals!

    Obviously vec scales badly the sparser the distribution of values become...

    IMO not very surprising.

    update

    found bug in benchmark, will correct later. vec still among slowest...

    update

    Thanks to BrowserUk for vividly commenting twice that the benchmark is broken, after I already mentioned that the benchmark is buggy.

    Cheers Rolf

    ( addicted to the Perl Programming Language)

      > found bug in benchmark, will correct later. vec still among slowest

      as promised:

      Integrity of the results are tested in a way that excludes possible sideeffects of benchmark-routines.

      Benchmarks are run with different datasets with shrinking densities ("vec"-approach breaks >1e10)

      Result: "vec" doesn't scale well for larger values...

      Perlversion v5.10.0 Setting values range: 1..1e4 ok 1 - hash_grep ~~ hash_key (3231 entries) ok 2 - hash_key ~~ hash_values (3231 entries) ok 3 - hash_values ~~ using_vec (3231 entries) ok 4 - using_vec ~~ hash_grep (3231 entries) ok 5 - hash_grep ~~ hash_key (3231 entries) ok 6 - hash_key ~~ hash_values (3231 entries) ok 7 - hash_values ~~ using_vec (3231 entries) 1..7 Setting values range: 1..1e4 Rate hash_values hash_key using_vec hash_grep hash_values 29.8/s -- -45% -48% -48% hash_key 53.8/s 81% -- -7% -7% using_vec 57.7/s 93% 7% -- -0% hash_grep 57.7/s 93% 7% 0% -- Setting values range: 1..1e6 Rate hash_values hash_key using_vec hash_grep hash_values 28.5/s -- -24% -24% -25% hash_key 37.3/s 31% -- -0% -2% using_vec 37.3/s 31% 0% -- -2% hash_grep 38.2/s 34% 2% 2% -- Setting values range: 1..1e8 Rate using_vec hash_values hash_grep hash_key using_vec 17.1/s -- -41% -53% -57% hash_values 29.1/s 70% -- -20% -26% hash_grep 36.2/s 112% 24% -- -8% hash_key 39.5/s 131% 36% 9% -- Setting values range: 1..1e9 Rate using_vec hash_values hash_grep hash_key using_vec 3.29/s -- -88% -91% -92% hash_values 28.2/s 759% -- -21% -27% hash_grep 36.0/s 993% 27% -- -7% hash_key 38.8/s 1081% 38% 8% -- Compilation finished at Wed Nov 27 02:08:15
      code:

      Cheers Rolf

      ( addicted to the Perl Programming Language)

      I avoided allocating useless arrays for the result

      Your benchmark is completely unrealistic, chalk & cheese comparison, and thus totally broken.

      1. keys & values used in void context:>

        From perlfunc:

        • "In particular, calling keys() in void context resets the iterator with no other overhead."
        • "(In particular, calling values() in void context resets the iterator with no other overhead.)"
      2. Whilst grep in a void context probably avoid generating the return list; it doesn't stop it from iterating the entire input list.

      Benchmark code that doesn't actually produce the required result is broken. And broken, is just broken; of no value at all.

      Obviously vec scales badly the sparser the distribution of values become...

      Think again.


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

      This just gets funnier and funnier.

      These are the results from kenosis benchmark:

      C:\test>1064178-b.pl Rate OPdiff hash_grep OPdiffModified OPdiff_undef + using_vec OPdiff 90.9/s -- -47% -52% -55% + -64% hash_grep 170/s 87% -- -10% -17% + -33% OPdiffModified 189/s 108% 11% -- -7% + -26% OPdiff_undef 204/s 124% 20% 8% -- + -20% using_vec 254/s 180% 49% 35% 25% + --

      And these from your benchmark:

      C:\test>junk.pl hash_grep: *CMP::hash_grep hash_key_diff: *CMP::hash_key_diff hash_values_diff: *CMP::hash_values_diff using_vec: *CMP::using_vec Rate using_vec hash_values_diff hash_key_diff + hash_grep using_vec 69161/s -- -81% -84% + -88% hash_values_diff 363049/s 425% -- -14% + -39% hash_key_diff 421810/s 510% 16% -- + -29% hash_grep 594454/s 760% 64% 41% + --

      Forget the order of the results, note that all your results are something like 2000 times faster despite that you're using pretty much the same sized arrays as kenosis. Do you have some magic touch that makes computers work so much harder for you?

      Oh look and what your complicated way of building a coderefhash produces:

      my $href = pckg_subs(); pp $href; { hash_grep => sub { "???" }, hash_key_diff => sub { "???" }, hash_values_diff => sub { "???" }, using_vec => sub { "???" }, }

      Like I said. Broken is just broken.


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
Re^4: Best method to diff very large array efficiently
by BrowserUk (Patriarch) on Nov 26, 2013 at 19:49 UTC
    I wouldn't have predicted such a dramatic performance disparity across Perl versions.

    It took me by surprise also.

    I've done, and posted, this hashes versus vec benchmark many times over the years, yours was the first challenge to what I took to be simple fact.

    Its yet another plank in my rapidly growing conclusion that 5.10.1 was 'peak Perl'.

    Have updated my benchmarking post with "(Perl v5.14.2 64-bit)" - which I now think should always be included in benchmarks.

    I wholeheartedly concur and will endeavour to do the same in future.

    Greatly appreciate your informative reply!

    We both learned something. That's win-win. You can't ask for more :)


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1064440]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others romping around the Monastery: (4)
As of 2024-04-19 02:11 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found