Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

Re^5: Using hashes for set operations...

by jaredor (Deacon)
on May 26, 2011 at 03:39 UTC ( #906743=note: print w/ replies, xml ) Need Help??


in reply to Re^4: Using hashes for set operations...
in thread Using hashes for set operations...

Such a kind response deserves a little extra effort. Below is my "cleanest" attempt at solving your puzzle. It was motivated by two facts 1) I was wrong about the behavior of delete in that it returns the deleted values, not the deleted keys, so the "value equal key" array is necessary and 2) I've read a few responses on perlmonks that say hash elements that have undef as a value actually point to the same undef and thus potentially save some space. (I cannot find this stated in the documentation, but it probably is that I just don't know where to look; plus all the keywords that spring to mind are ubiquitous ;-)

  • Union is dead easy.
  • Intersection uses a temporary hash as well as a grep. I chose this way as being more succinct than efficient, but admit it violates your rules.
  • Symmetric Difference is about the same effort as two intersections; however, there is no grep!

All three constructs were written to be independent of the others, but obviously that need not be a constraint; thus symmetric difference could be as easy as taking a copy of the union and deleting the intersection (as in your approach). I decided to make intersection "easier" than symmetric difference because I think intersections are more common, at least in the type of code I write. Lastly, I have decided to have all the result hashes "look the same" in that every element value is undef--which may or may not be more efficient!

That's all I can do. Thanks, this has been fun.

#!/usr/bin/env perl use strict; use warnings; use Data::Dump qw(pp); my @A = qw( 1 2 3 4 5 ); my @B = qw( 3 4 5 6 7 ); my %unionAB; @unionAB{@A,@B} = undef; my %interAB; { my (%tmpA); @tmpA{@A} = @A; @interAB{(grep {defined} delete @tmpA{@B})} = undef; } my %sdiffAB; { my (%tmpA, %tmpB); @tmpA{@A} = @A; @tmpB{@B} = @B; delete @tmpA{@B}; delete @tmpB{@A}; @sdiffAB{keys %tmpA, keys %tmpB} = undef; } # View results print 'unionAB: ', pp(\%unionAB), "\n"; print 'interAB: ', pp(\%interAB), "\n"; print 'sdiffAB: ', pp(\%sdiffAB), "\n";

(Sadly, I feel your point about adding detail. Most of my time not coding was spent trying to guard against the charge that I'm an idiot who doesn't know set theory. Of course if my beloved Dr. Kaiser were on perlmonks and posted such, I would have to agree, but since he loves PROLOG, I think I'm safe.)


Comment on Re^5: Using hashes for set operations...
Download Code
Re^6: Using hashes for set operations...
by LanX (Canon) on May 27, 2011 at 15:44 UTC
    Thanks! :)

    > Lastly, I have decided to have all the result hashes "look the same" in that every element value is undef--which may or may not be more efficient!

    unfortunately this collides with my requirement to support all kind of sets:

    From the OP

    This is fundamentally wrong because keys are stringifications and any reference type will not be reproduced (just think about arrays of objects or a AoH or ...)

    Cheers Rolf

      Ouch! You're right.

      This need to be a one-stop-shop for all data structures gets back to your informative dialog with John M. Dlugosz. However in that thread I feel your claim to focus on objects seems a bit disingenuous: If you have no equivalence method to call, then your intersections are always null. To use your imagery, elements in different sets of employees that could be twins are never identified.

      (Here I address a rebuttal that is not necessarily yours. I just like using second person in threads.)

      You then could claim that you are only interested in identity, not equivalence, and that you get identity from stringification; so non-trivial intersections are again a possibility.

      However, this claim would imply that stringification defines a bijection between objects and their hash keys.

      Ergo, hash keys are all you need! ;-)

      That being said, if you carry the values with you, then your gratification is more immediate, so despite my efforts and protests, we should just go ahead and scrap my attempts in the name of efficiency. This whole thread turns out to be a roundabout way of me agreeing to whatever it is that you and John M. Dlugosz hashed out (sic).

        However, this claim would imply that stringification defines a bijection between objects and their hash keys.

        Ergo, hash keys are all you need! ;-)

        in theory yes, practically no.

        There is no pure perl way to get the ref from the stringification, as long the relation is stored in a hash (sic).

        And yes collisions can happen, when strings and references are mixed into one set...then resulting into the need of a specialized stringification.

        Cheers Rolf

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://906743]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chanting in the Monastery: (18)
As of 2014-08-22 16:58 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The best computer themed movie is:











    Results (161 votes), past polls