Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

Re: Is there a difference in this declaration?

by kcott (Abbot)
on May 09, 2014 at 08:37 UTC ( #1085560=note: print w/ replies, xml ) Need Help??


in reply to Is there a difference in this declaration?

There's no difference. Both declare a hash with zero key/value pairs:

$ perl -Mstrict -Mwarnings -E ' my %x; say scalar keys %x; say scalar values %x; my %y = (); say scalar keys %y; say scalar values %y; ' 0 0 0 0

Including an assignment has some overhead: typically negligible but may be significant in looping code.

#!/usr/bin/env perl use strict; use warnings; use Benchmark qw{cmpthese}; cmpthese -1 => { no_assignment => sub { my %hash }, assignment => sub { my %hash = () }, };

Output:

Rate assignment no_assignment assignment 6672755/s -- -60% no_assignment 16770827/s 151% --

I wouldn't necessarily consider one form to be "more correct" than the other.

I generally use the "my %hash_data;" form.

[Minor Update: I removed "use autodie;" from the benchmark code as it wasn't necessary (it was an artefact from the last use of this script which I often rework for example code); retested; much the same results.]

-- Ken


Comment on Re: Is there a difference in this declaration?
Select or Download Code
Re^2: Is there a difference in this declaration?
by Anonymous Monk on May 09, 2014 at 08:41 UTC
    Ah OK,
    now I know! Thank you very much!
Re^2: Is there a difference in this declaration? (insignificant)
by tye (Cardinal) on May 09, 2014 at 14:57 UTC
    but may be significant in looping code

    No, not really. You've fallen for the classic fallacy that Benchmark's overblown attempts to "eliminate overhead" can often lead to. The huge values in the "rate" column are a good indicator.

    Let's test your theory by actually writing looping code and seeing how "significant" this difference can be. We'll have to come up with a loop that has a useful declaration of a hash inside of it and yet can complete iterations at something close to 6 million times each second and yet where the loop gets enough useful stuff done that almost no other code is required to get a useful result (as other code will further dilute the relative speed-up and thus reduce its significance).

    When talking about a Perl operation that can happen 6 million times each second, it is pretty much impossible to make such a single operation be a non-trivial percentage of a useful script's run time. This is classic "micro optimization", a fool's errand.

    So, for a declaration of a hash to be useful, surely you have to insert something into the hash. Since it is a fresh declaration, you're also going to need to use the hash or else you'll be building up close to 6 million new hashes each second and will quickly run out of memory. And this needs to somewhat simulate useful code as speeding up useless code is not "significant", it is theory at best and more often just pointless. :)

    So, here is looping code that does nothing but add two entries to the hash. It isn't useful, but it is pretty darn minimal. Truly useful code is surely going to have to do more than this for the hash declaration to be a useful part of it.

    #!/usr/bin/perl use strict; use warnings; use Benchmark qw{cmpthese}; cmpthese( -1 => { no_assignment => sub { for( 1..10_000 ) { my %hash; $hash{$_} = $_; $hash{-$_} = -$_; } }, assignment => sub { for( 1..10_000 ) { my %hash = (); $hash{$_} = $_; $hash{-$_} = -$_; } }, } ); __END__ Rate assignment ano_assignment assignment 99.4/s -- -8% no_assignment 108/s 9% --

    Above is a typical result from a run of the script. In my experience, a 10% speed-up would be characterized as "something I'm quite unlikely to even notice" which falls a long way from "significant".

    The speed difference is small enough that I even got this result when I ran the script a few times to verify that my first results weren't atypical:

    Rate no_assignment assignment no_assignment 96.6/s -- -3% assignment 99.4/s 3% --

    Note that the "with assignment" code is the one that ran faster that time.

    Finally, a quick demonstration of why I think Benchmark.pm's attempt to "eliminate overhead" are overblown. With all of the insertions commented out, a typical result is:

    Rate assignment no_assignment assignment 1068/s -- -37% no_assignment 1685/s 58% --

    While your original code on my computer gives:

    Rate assignment no_assignment assignment 11967704/s -- -49% no_assignment 23642004/s 98% --

    ...and takes noticeably longer to run. Benchmark has to over and over again try running the code in a tight loop with increasing repetition counts because it gets back time measurements that are too close to "the time it takes to run empty code" for the result to be considered meaningful enough to even be reported.

    When that happens, the results are nearly guaranteed to have no practical value.

    Note that none of this is meant as much of a criticism of what you wrote. Based on the numbers you got, it certainly might have been possible to have a significant impact. Your statement was quite conservative. But my experience lead me to doubt that such could happen, so I did a quick test to verify it.

    This case is actually rather close to the edge of it being possible for a real, useful script to end up 20% faster (a minimum to be noticeable, IME) with only this change (though likely still rather contrived). Certainly extremely unlikely.

    The speed difference certainly looks to be insignificant to me.

    - tye        

      "You've fallen for the classic fallacy ..."

      Utter rubbish! I've "fallen" for no such thing.

      Before posting, I'd assumed the assignment incurred some overhead but also considered that an optimisation might have been applied to negate this. I chose to check it.

      The benchmark code indicated the overhead did exist: I posted the code and results to show this. I made no inferences nor offered any conclusions about the benchmark results.

      I wrote that the overhead was "typically negligible". I see that you excluded that from your opening quote.

      Anything, no matter how small, when multiplied enough times will become a bigger thing: that bigger thing "may be significant".

      -- Ken

        I see that you excluded that from your opening quote.

        I see that you excluded part of what I said from your opening quote:

        Note that none of this is meant as much of a criticism of what you wrote.

        But you certainly seemed to have taken it that way. Understandable, though, despite the disclaimer.

        But I will object to one new thing you added:

        Anything, no matter how small, when multiplied enough times will become a bigger thing: that bigger thing "may be significant".

        That math doesn't actually work very well when talking about code optimization. The more you multiply the code, the more the significance ends up being divided.

        Heck, you can't even multiply the results from Benchmark.pm by just 1. Benchmark.pm starts by telling you that something takes 150% (or 100%) more time and then I multiply the code with a 10,000-iteration loop and the difference is divided by 2 or 3 (down close to the maximum possible difference you can actually achieve even with completely contrived code, because real code doesn't have the option of ignoring overhead, like Benchmark.pm tries to do).

        - tye        

Re^2: Is there a difference in this declaration?
by ikegami (Pope) on May 14, 2014 at 15:20 UTC
    So the assignment takes 0.000,000,09 seconds if this is precise (which isn't likely given the minuteness).

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1085560]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (8)
As of 2014-12-28 03:50 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (178 votes), past polls