Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked

how do I efficiently remove one hash from another?

by perltux (Monk)
on Nov 27, 2012 at 06:48 UTC ( #1005796=perlquestion: print w/replies, xml ) Need Help??
perltux has asked for the wisdom of the Perl Monks concerning the following question:

I have two simple hashes with several key=>value pairs. Now I want to remove all keys of %hash2 from %hash1 (if they also exist in %hash1, regardless if the values differ or not). Basically %hash1 should be left with only keys that don't exist in %hash2. %hash2 should not be modified itself.
Is there a more efficient way to do this other than a loop that goes through each key of %hash2 and removes it from %hash1 one at a time?
  • Comment on how do I efficiently remove one hash from another?

Replies are listed 'Best First'.
Re: how do I efficiently remove one hash from another?
by CountZero (Bishop) on Nov 27, 2012 at 07:05 UTC
    Use a hash slice: delete @hash1{keys %hash2};

    Update: deleted an unmatched ")". Thanks frozenwithjoy.


    A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

    My blog: Imperial Deltronics
      Thanks, that's just what I was looking for!
Re: how do I efficiently remove one hash from another?
by davido (Archbishop) on Nov 27, 2012 at 07:11 UTC

    Whether the loop is explicit, or implicit, there's a loop. But one cool means is:

    delete @hash1{ keys %hash2 };

    ...which is pretty much the same thing as...

    delete $hash1{$_} for keys %hash2;

    ...but with an implicit loop (via the hash slice) rather than the explicit 'for' loop. Note in either case, there's no need to worry about checking exists: delete doesn't complain if the element doesn't already exist.


      The former compiles to a much smaller op tree than the latter:

      perl -MO=Concise -e'delete @hash1{keys %hash2}' perl -MO=Concise -e'delete $hash1{$_} for keys %hash2'

      On my machine, the slice performs about 20% faster than the loop.

      perl -E'sub Monkey::do{say$_,for@_,do{($monkey=[caller(0)]->[3])=~s{::}{ }and$monkey}}"Monkey say"->Monkey::do'
        The slice option only calls delete once. It also doesn't have to go the trouble of assigning $_ for each element. So that makes sense.

        When's the last time you used duct tape on a duct? --Larry Wall
      Thanks for the detailed explanation.
Re: how do I efficiently remove one hash from another?
by rjt (Deacon) on Nov 27, 2012 at 09:28 UTC

    If you are concerned about looping through the values (as is common and necessary in all the ways to do what you've asked), is there any other way you can build up or maintain the data in the first place? Whenever I find an expensive operation in one of my programs, it's always worth going back to the data representation itself to see if there is a more efficient way of maintaining the data.

    For instance, if the hash keys do not change often, and you are more concerned about CPU than memory, separately maintaining %hash1_minus_hash2 might be better. Your "write" operation takes twice as long (still O(1), just with a higher constant), but your O(n) loop is now O(1), which can be a huge win.

    Also, if %hash1 is not more than twice as big as %hash2, and you're going to loop through %hash1 for the next step of your algorithm anyway (and you do this at most once per delete cycle), a construct like this might actually be more efficient:

    for (keys %hash1) { next if exists $hash2{$_}; # ... your main loop code here }

    There are of course many additional strategies that may be a better fit, but perhaps these two will spark an interest to take the problem up a level.

    And, it's entirely possible there's nothing else you can do in your case. If that's true, cheers anyway! :-)

Re: how do I efficiently remove one hash from another?
by Anonymous Monk on Nov 27, 2012 at 21:48 UTC
    When something needs to be in more than one structure at a time, the notion of "references" makes this very-easily done. There's only one item of data, with n references to it, and the data won't disappear from memory until the reference-count becomes zero. (But avoid "self-referencing" data objects; see "weakened" references.)

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1005796]
Approved by Paladin
Front-paged by 2teez
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others pondering the Monastery: (5)
As of 2018-05-25 15:04 GMT
Find Nodes?
    Voting Booth?