Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things

Re^3: Concerning hash operations (appending, concatenating)

by japhy (Canon)
on Mar 24, 2005 at 17:50 UTC ( #442135=note: print w/replies, xml ) Need Help??

in reply to Re^2: Concerning hash operations (appending, concatenating)
in thread Concerning hash operations (appending, concatenating)

Yeah, but I don't like it. I don't like assigning X to itself like the example does. I'd prefer a slightly long-winded approach:
$hash{$_} = $otherhash{$_} for grep !exists $hash{$_}, keys %otherhash +;
{ my @new_keys = grep !exists $hash{$_}, keys %otherhash; @hash{@new_keys} = @otherhash{@new_keys}; }
Jeff japhy Pinyan, P.L., P.M., P.O.D, X.S.: Perl, regex, and perl hacker
How can we ever be the sold short or the cheated, we who for every service have long ago been overpaid? ~~ Meister Eckhart

Replies are listed 'Best First'.
Re^4: Concerning hash operations (appending, concatenating)
by RazorbladeBidet (Friar) on Mar 24, 2005 at 17:53 UTC
    But isn't that creating arrays and going through loops that are unnecessary?
    "But what of all those sweet words you spoke in private?"
    "Oh that's just what we call pillow talk, baby, that's all."
      Here's my benchmark:
      use Benchmark 'cmpthese'; use strict; my %hash = 1 .. 10_000; my %add = 9_001 .. 11_000; cmpthese(-5, { BASE => sub { my %copy = %hash; }, JAPHY => sub { my %copy = %hash; my @new = grep !exists $copy{$_}, keys %add; @copy{@new} = @add{@new}; }, ZAXO => sub { my %copy = %hash; %copy = (%add, %copy); }, NULL => sub { my %copy = %hash; %copy = (%copy); }, });
      Here are the results I get:
      Rate ZAXO NULL JAPHY BASE ZAXO 15.1/s -- -22% -61% -76% NULL 19.3/s 28% -- -50% -69% JAPHY 38.9/s 157% 101% -- -38% BASE 62.4/s 312% 223% 60% --
      The two cases 'BASE' and 'NULL' show that Perl does not optimize %hash = %hash at all, since 'NULL' is three times slower than 'BASE'. 'JAPHY' is 2.5 times faster than 'ZAXO' in this run, with 500 old keys and 500 new keys. When I change %add to be (5_001 .. 12_000), the run times are:
      Rate ZAXO NULL JAPHY BASE ZAXO 11.4/s -- -39% -58% -81% NULL 18.6/s 63% -- -31% -69% JAPHY 26.9/s 136% 45% -- -56% BASE 60.5/s 431% 225% 125% --
      The point is that %a = (%b, %a) is slow because it is O(N+M) where N is the size of %a and M is the size of %b. My code is O(M) -- specifically, O(M(1+x)) where x is the fraction of keys in %b that aren't in %a -- because it doesn't iterate over %a at all, only %b.
      Jeff japhy Pinyan, P.L., P.M., P.O.D, X.S.: Perl, regex, and perl hacker
      How can we ever be the sold short or the cheated, we who for every service have long ago been overpaid? ~~ Meister Eckhart

        The grep is superfluous and, IMO, only distracts from the code. Here's the sub to add to the above benchmark:

        TANKTALUS => sub { my %copy = %hash; @copy{keys %add} = values %add; },
        And the benchmarks. First, the original 500/500 overlap/new keys:
        Rate ZAXO NULL TANKTALUS JAPHY BASE ZAXO 12.3/s -- -23% -59% -59% -76% NULL 16.1/s 30% -- -46% -47% -68% TANKTALUS 29.8/s 142% 85% -- -2% -41% JAPHY 30.4/s 147% 89% 2% -- -40% BASE 50.9/s 313% 216% 71% 67% --
        As you can see, the difference is pretty insignificant compared to the extra "line noise" that needs to be parsed by the human who maintains it. I didn't think it'd change much with a larger sample, but I tried it anyway (using your 5_001..12_000 hash):
        Rate ZAXO NULL JAPHY TANKTALUS BASE ZAXO 9.48/s -- -39% -55% -55% -81% NULL 15.7/s 65% -- -26% -26% -69% JAPHY 21.1/s 122% 35% -- -0% -59% TANKTALUS 21.1/s 123% 35% 0% -- -59% BASE 51.1/s 439% 226% 142% 142% --
        Somewhere, deep down in the lost precision, the non-grep version comes out ahead. Which we'll attribute to statistical insigificance, just as I did for the 2% in the other direction. ;-)

        The other point is that your answer does not do the same thing as the rest of the examples here. In all the rest of the cases, the new hash always overrides the values in the old hash. That is, if the old hash had a => "b", while the new hash had a => "c", the behaviour of yours vs the rest is different. Which one is correct is dependant on the demands of the situation, although I usually lean towards new hashes winning over old hashes the way I write the code. In cases where I want "first come, first served" policies, then I will definitely keep your grep in mind - it's probably a lot more elegant than the way I have been doing it (if I have - I'd have to look through old code to be sure). Thanks ;-)

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://442135]
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others scrutinizing the Monastery: (15)
As of 2017-03-23 18:38 GMT
Find Nodes?
    Voting Booth?
    Should Pluto Get Its Planethood Back?

    Results (293 votes). Check out past polls.