Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

Re^4: Using hashes for set operations...

by LanX (Canon)
on May 24, 2011 at 14:09 UTC ( #906502=note: print w/ replies, xml ) Need Help??


in reply to Re^3: Using hashes for set operations...
in thread Using hashes for set operations...

> apparently the delete function does return a list of keys it's deleted, so delete(@A{keys B}) does return the intersection of A and B ...

Good observation! I forgot about this.... so one gets the difference and the intersection with one operation, thats brilliant.

So a simple upgrade of delete with an extra switch to ignore undefs would give Perl a native primitive to handle set operations...

And I enjoyed your post too, I'm still meditating about how to make use of your approach...

> (Even though I am a failed mathematician, it does sting a little that you felt the need to give me a link to De Morgan's laws.)

The link wasn't meant for you, believe me. :)

I'm mathematician by training but not a native English speaker, so I need to elaborate my posts longer and stuff them with extra informations to avoid misunderstandings. (And the monastery is full of obsessed nitpickers...)

Cheers Rolf


Comment on Re^4: Using hashes for set operations...
Re^5: Using hashes for set operations...
by jaredor (Deacon) on May 26, 2011 at 03:39 UTC

    Such a kind response deserves a little extra effort. Below is my "cleanest" attempt at solving your puzzle. It was motivated by two facts 1) I was wrong about the behavior of delete in that it returns the deleted values, not the deleted keys, so the "value equal key" array is necessary and 2) I've read a few responses on perlmonks that say hash elements that have undef as a value actually point to the same undef and thus potentially save some space. (I cannot find this stated in the documentation, but it probably is that I just don't know where to look; plus all the keywords that spring to mind are ubiquitous ;-)

    • Union is dead easy.
    • Intersection uses a temporary hash as well as a grep. I chose this way as being more succinct than efficient, but admit it violates your rules.
    • Symmetric Difference is about the same effort as two intersections; however, there is no grep!

    All three constructs were written to be independent of the others, but obviously that need not be a constraint; thus symmetric difference could be as easy as taking a copy of the union and deleting the intersection (as in your approach). I decided to make intersection "easier" than symmetric difference because I think intersections are more common, at least in the type of code I write. Lastly, I have decided to have all the result hashes "look the same" in that every element value is undef--which may or may not be more efficient!

    That's all I can do. Thanks, this has been fun.

    #!/usr/bin/env perl use strict; use warnings; use Data::Dump qw(pp); my @A = qw( 1 2 3 4 5 ); my @B = qw( 3 4 5 6 7 ); my %unionAB; @unionAB{@A,@B} = undef; my %interAB; { my (%tmpA); @tmpA{@A} = @A; @interAB{(grep {defined} delete @tmpA{@B})} = undef; } my %sdiffAB; { my (%tmpA, %tmpB); @tmpA{@A} = @A; @tmpB{@B} = @B; delete @tmpA{@B}; delete @tmpB{@A}; @sdiffAB{keys %tmpA, keys %tmpB} = undef; } # View results print 'unionAB: ', pp(\%unionAB), "\n"; print 'interAB: ', pp(\%interAB), "\n"; print 'sdiffAB: ', pp(\%sdiffAB), "\n";

    (Sadly, I feel your point about adding detail. Most of my time not coding was spent trying to guard against the charge that I'm an idiot who doesn't know set theory. Of course if my beloved Dr. Kaiser were on perlmonks and posted such, I would have to agree, but since he loves PROLOG, I think I'm safe.)

      Thanks! :)

      > Lastly, I have decided to have all the result hashes "look the same" in that every element value is undef--which may or may not be more efficient!

      unfortunately this collides with my requirement to support all kind of sets:

      From the OP

      This is fundamentally wrong because keys are stringifications and any reference type will not be reproduced (just think about arrays of objects or a AoH or ...)

      Cheers Rolf

        Ouch! You're right.

        This need to be a one-stop-shop for all data structures gets back to your informative dialog with John M. Dlugosz. However in that thread I feel your claim to focus on objects seems a bit disingenuous: If you have no equivalence method to call, then your intersections are always null. To use your imagery, elements in different sets of employees that could be twins are never identified.

        (Here I address a rebuttal that is not necessarily yours. I just like using second person in threads.)

        You then could claim that you are only interested in identity, not equivalence, and that you get identity from stringification; so non-trivial intersections are again a possibility.

        However, this claim would imply that stringification defines a bijection between objects and their hash keys.

        Ergo, hash keys are all you need! ;-)

        That being said, if you carry the values with you, then your gratification is more immediate, so despite my efforts and protests, we should just go ahead and scrap my attempts in the name of efficiency. This whole thread turns out to be a roundabout way of me agreeing to whatever it is that you and John M. Dlugosz hashed out (sic).

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://906502]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others studying the Monastery: (3)
As of 2014-07-31 02:53 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (244 votes), past polls