Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

Re^4: Best method to diff very large array efficiently

by LanX (Saint)
on Nov 26, 2013 at 18:53 UTC ( [id://1064446]=note: print w/replies, xml ) Need Help??


in reply to Re^3: Best method to diff very large array efficiently
in thread Best method to diff very large array efficiently

my benchmark was run on v5.10.0 built for i486-linux-gnu-thread-multi and vec was by far the slowest.

3 differences:

  • I avoided allocating useless arrays for the result
  • Only tested with non-core modules (too lazy to install)
  • LBNL: I tested with random numbers out of 1..1e6 and you took compact intervals!

Obviously vec scales badly the sparser the distribution of values become...

IMO not very surprising.

update

found bug in benchmark, will correct later. vec still among slowest...

update

Thanks to BrowserUk for vividly commenting twice that the benchmark is broken, after I already mentioned that the benchmark is buggy.

Cheers Rolf

( addicted to the Perl Programming Language)

Replies are listed 'Best First'.
Re^5: Best method to diff very large array efficiently
by LanX (Saint) on Nov 27, 2013 at 01:21 UTC
    > found bug in benchmark, will correct later. vec still among slowest

    as promised:

    Integrity of the results are tested in a way that excludes possible sideeffects of benchmark-routines.

    Benchmarks are run with different datasets with shrinking densities ("vec"-approach breaks >1e10)

    Result: "vec" doesn't scale well for larger values...

    Perlversion v5.10.0 Setting values range: 1..1e4 ok 1 - hash_grep ~~ hash_key (3231 entries) ok 2 - hash_key ~~ hash_values (3231 entries) ok 3 - hash_values ~~ using_vec (3231 entries) ok 4 - using_vec ~~ hash_grep (3231 entries) ok 5 - hash_grep ~~ hash_key (3231 entries) ok 6 - hash_key ~~ hash_values (3231 entries) ok 7 - hash_values ~~ using_vec (3231 entries) 1..7 Setting values range: 1..1e4 Rate hash_values hash_key using_vec hash_grep hash_values 29.8/s -- -45% -48% -48% hash_key 53.8/s 81% -- -7% -7% using_vec 57.7/s 93% 7% -- -0% hash_grep 57.7/s 93% 7% 0% -- Setting values range: 1..1e6 Rate hash_values hash_key using_vec hash_grep hash_values 28.5/s -- -24% -24% -25% hash_key 37.3/s 31% -- -0% -2% using_vec 37.3/s 31% 0% -- -2% hash_grep 38.2/s 34% 2% 2% -- Setting values range: 1..1e8 Rate using_vec hash_values hash_grep hash_key using_vec 17.1/s -- -41% -53% -57% hash_values 29.1/s 70% -- -20% -26% hash_grep 36.2/s 112% 24% -- -8% hash_key 39.5/s 131% 36% 9% -- Setting values range: 1..1e9 Rate using_vec hash_values hash_grep hash_key using_vec 3.29/s -- -88% -91% -92% hash_values 28.2/s 759% -- -21% -27% hash_grep 36.0/s 993% 27% -- -7% hash_key 38.8/s 1081% 38% 8% -- Compilation finished at Wed Nov 27 02:08:15
    code:

    Cheers Rolf

    ( addicted to the Perl Programming Language)

Re^5: Best method to diff very large array efficiently
by BrowserUk (Patriarch) on Nov 26, 2013 at 20:00 UTC
    I avoided allocating useless arrays for the result

    Your benchmark is completely unrealistic, chalk & cheese comparison, and thus totally broken.

    1. keys & values used in void context:>

      From perlfunc:

      • "In particular, calling keys() in void context resets the iterator with no other overhead."
      • "(In particular, calling values() in void context resets the iterator with no other overhead.)"
    2. Whilst grep in a void context probably avoid generating the return list; it doesn't stop it from iterating the entire input list.

    Benchmark code that doesn't actually produce the required result is broken. And broken, is just broken; of no value at all.

    Obviously vec scales badly the sparser the distribution of values become...

    Think again.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
Re^5: Best method to diff very large array efficiently
by BrowserUk (Patriarch) on Nov 26, 2013 at 20:34 UTC

    This just gets funnier and funnier.

    These are the results from kenosis benchmark:

    C:\test>1064178-b.pl Rate OPdiff hash_grep OPdiffModified OPdiff_undef + using_vec OPdiff 90.9/s -- -47% -52% -55% + -64% hash_grep 170/s 87% -- -10% -17% + -33% OPdiffModified 189/s 108% 11% -- -7% + -26% OPdiff_undef 204/s 124% 20% 8% -- + -20% using_vec 254/s 180% 49% 35% 25% + --

    And these from your benchmark:

    C:\test>junk.pl hash_grep: *CMP::hash_grep hash_key_diff: *CMP::hash_key_diff hash_values_diff: *CMP::hash_values_diff using_vec: *CMP::using_vec Rate using_vec hash_values_diff hash_key_diff + hash_grep using_vec 69161/s -- -81% -84% + -88% hash_values_diff 363049/s 425% -- -14% + -39% hash_key_diff 421810/s 510% 16% -- + -29% hash_grep 594454/s 760% 64% 41% + --

    Forget the order of the results, note that all your results are something like 2000 times faster despite that you're using pretty much the same sized arrays as kenosis. Do you have some magic touch that makes computers work so much harder for you?

    Oh look and what your complicated way of building a coderefhash produces:

    my $href = pckg_subs(); pp $href; { hash_grep => sub { "???" }, hash_key_diff => sub { "???" }, hash_values_diff => sub { "???" }, using_vec => sub { "???" }, }

    Like I said. Broken is just broken.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1064446]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others romping around the Monastery: (5)
As of 2024-04-23 05:06 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found