Beefy Boxes and Bandwidth Generously Provided by pair Networks DiBona
Perl Monk, Perl Meditation
 
PerlMonks  

Re: Idiom: hashes as sets

by dragonchild (Archbishop)
on Jul 02, 2008 at 21:27 UTC ( #695203=note: print w/ replies, xml ) Need Help??


in reply to Idiom: hashes as sets

You're going to run into problems with non-scalar items, such as objects and hashrefs and listrefs. This is why you want to use a CPAN module. This is also true with the idiomatic (but wrong) unique'ing function that uses a hash. Much better is to use the uniq() function from List::MoreUtils which has been fixed to handle these issues.


My criteria for good software:
  1. Does it work?
  2. Can someone else come in, make a change, and be reasonably certain no bugs were introduced?


Comment on Re: Idiom: hashes as sets
Re^2: Idiom: hashes as sets
by Tanktalus (Canon) on Jul 03, 2008 at 16:00 UTC

    Ok, you've piqued my curiosity ... I've looked both at the perl and the XS versions of uniq, and can't discern the difference between it and the idiomatic my %seen; @uniq = grep {not $seen{$_}++} @random_values. Though, I have to grant that it took me an inordinate amount of time to figure out the difference in the XS code between the scalar context codepath and the list context codepath, so I could easily be missing a subtle nuance somewhere. Care to be more explicit in which way the idiomatic method is wrong?

      Here's uniq from List::MoreUtils:

      sub uniq (@) { my %h; map { $h{$_}++ == 0 ? $_ : () } @_; }

      I think this is equivalent to your "my %h; @uniq = grep { ! $h{$_}++ } @in;". What's important about both of them is that they don't do this:

      sub bad_uniq { my %h; @h{@in} = (); return keys %h; }

      ...which may be more succinctly (idiomatically) written as "keys %{{ map { $_ => 1 } @in }}".

      The difference is that keys will not return what was in the list to begin with but rather whatever those things come out as after being stringified. The solutions using map and grep both will stringify the stuff you feed them, but it's the stuff that's returned, and not the stringy leftovers that were used to tell which were duplicates.

      Update with a demonstration:

      use List::MoreUtils 'uniq'; use Data::Dumper; my $aref = [ 1, 2 ]; my @aref_duplicated = ( $aref, $aref, $aref ); my @u1 = uniq( @aref_duplicated ); my @u2 = keys %{{map {$_=>1} @aref_duplicated}}; print Data::Dumper->Dump( [ \@u1, \@u2 ], [ '*from_uniq', '*from_keys' ] ); __END__ @from_uniq = ( [ 1, 2 ] ); @from_keys = ( 'ARRAY(0x8153c28)' );

        Yes, that is an excellent point. I forgot to mention in the OP that I have only been using the idiom for sets of strings and integers. If I were storing references, I would be using a different system entirely -- or select an indexable attribute and make a separate hash out of that.

        --
        print "Just Another Perl Adept\n";

Re^2: Idiom: hashes as sets
by Jenda (Abbot) on Jul 06, 2008 at 21:06 UTC

    This is why you want to use a CPAN module if the items in your set are not strings or numbers.

    If they are and you do not need any advanced set operations, then a module would be an overkill.

      Unless, of course, the following reasons apply:
      • I want to future-proof myself against changing my requirements for membership.
      • I want to future-proof myself against changing my requirements for operations performed on my set.
      • I want to not worry about bugs in my code, but would rather use battle-tested code.

      My criteria for good software:
      1. Does it work?
      2. Can someone else come in, make a change, and be reasonably certain no bugs were introduced?
        • A set containing some strings and some objects doesn't make much sense and if you switch from a set of strings to a set of objects than the way the set is implemented will be the least of your worries. You'll have to change lots and lots of other things. Besides in such a case Tie::RefHash might very well be enough.
        • You can't presume all future requirements. So any futureproofing is bound to fail.
        • I doubt any module is as well tested as hash operations.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://695203]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others romping around the Monastery: (9)
As of 2014-04-17 10:35 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    April first is:







    Results (444 votes), past polls