Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

Perl Idioms Explained - keys %{{map{$_=>1}@list}}

by broquaint (Abbot)
on Aug 04, 2003 at 12:55 UTC ( [id://280658]=perlmeditation: print w/replies, xml ) Need Help??

Perl Idioms Explained - keys %{{map{$_=>1}@list}}

my @list = qw/a b c d d a e b a b d e f/; my @uniq = keys %{{ map {$_ => 1} @list }}; print "@list\n@uniq\n"; __output__ a b c d d a e b a b d e f e f a b c d
The above idiom is a simple way of creating a list of unique values from another list, as the output of the code aptly demonstrates. However, with all those curly braces it may not be immediately obvious what's going on, so let's break it down.
map { $_ => 1 } @list
This is pretty straight-forward - create a list of key-value pairs where the keys are the values from @list.
{ map { $_ => 1 } @list }
The surrounding pair of curly braces creates an anonymous hash which is populated with they key-value pairs from the map statement. So we now have a hash reference to an anonymous hash who's keys are the elements from @list, and because hash keys are unique, the keys of the anonymous hash represent a unique set of values.
keys %{ { map { $_ => 1 } @list } }
Finally, with the last pair of curly braces the hash reference to the anonymous hash is dereferenced and we get its list of keys.

Caveats

Because this idiom creates a list, and then a hash, both of which are immediately disposed of, it's not suited for large lists. Also, because the keys of a hash aren't ordered, the list of unique values returned will be in a random order (although this can often be remedied with a simple sort).

Summary

A short (memory hungry) way to get a list of unique values from a given list.

_________
broquaint

Replies are listed 'Best First'.
Re: Perl Idioms Explained - keys %{{map{$_=>1}@list}}
by diotalevi (Canon) on Aug 04, 2003 at 13:23 UTC

    Instead of using `=> 1`, it is valid and cheaper to use `=> undef`.

      I'm not disagreeing, but I am curious. Why is it cheaper? One less node allocated? (Oh, I think I see... PL_sv_undef gets reused instead of a new SV?)

      This sounds like a job for those perl internals diagrams someone did at some point... Wish my memory was better.
      --
      Mike

        Yes. Instead of one scalar of the value '1' for every key, you get to share the same undef value for all the keys and thus don't have to allocate tons of memory you aren't going to use anyway. So keys @{{map { $_, undef } @ary}} has the potential to be a lot cheaper than keys @{{map { $_, 1 } @ary}}. The "a lot" part is just related to how many values are in your array. You burn one extra value for every entry in @ary and I'm just suggesting that you don't have to do that in this case.

Re: Perl Idioms Explained - keys %{{map{$_=>1}@list}}
by Juerd (Abbot) on Aug 04, 2003 at 13:02 UTC

    In the same category, to see if $bar is in @foo:

    my %foo = map { $_ => 1 } @foo; if ($foo{$bar}) { ... }
    or faster and using less memory:
    my %foo; undef @foo{@foo}; if (exists $foo{$bar}) { ... }

    Of course, for a single test, grep is easier and often faster:

    if (grep $_ eq $bar, @foo) { ... }

    Juerd # { site => 'juerd.nl', plp_site => 'plp.juerd.nl', do_not_use => 'spamtrap' }

Re: Perl Idioms Explained - keys %{{map{$_=>1}@list}}
by thinker (Parson) on Aug 04, 2003 at 15:46 UTC

    Hi broquaint,

    or, to keep the order that the items are inserted (ie. the order in which the first instance of a value is encountered)

    my %seen; @uniq = grep ! $seen{$_}++, @list;

    cheers

    thinker

      This is the way I've usually seen it done. I was curious, so I ran a benchmark against the two.
      use Benchmark; my @list; for ( 0..9999 ) { push @list, sprintf "%d", 100 * rand ; } timethese( 1000, { 'keys_map' => sub { my @uniq = keys %{{ map {$_ => 1} @list }} +; }, 'grep_seen' => sub { my %seen; my @uniq = grep ! $seen{$_}+ ++, @list; }, } );
      Yields the following output.

      Benchmark: timing 1000 iterations of grep_seen, keys_map... grep_seen: 13 wallclock secs (11.17 usr + 0.00 sys = 11.17 CPU) @ 89 +.52/s (n=1000) keys_map: 30 wallclock secs (29.28 usr + 0.00 sys = 29.28 CPU) @ 34 +.15/s (n=1000)
        Adding
        'keys_map_undef' => sub { my @uniq = keys %{{ map {$_ => undef} @list +}}; },
        to test the undef suggestion, it turns out to be 15-20% faster than using 1.

        grep still wins, though.
        --
        Mike

      You can also grep lists for certain 'numbers' of entries (so if you wanted only the items that were in a list twice)
      @doubles = grep ++$seen{$_} == 2, @list;
      Jasper

        That would be 2 or more times right? Once ++$seen{$_} == 2 is true, you are immediatelly copying $_ to @doubles. So if it appeared a third time, you couldn't magically remove it again. Your code would be clearer if you used >= to emphasize that.

        The following would probably do if you only wanted entries with exactly 2 occurences:

        @doubles = grep $seen{$_} == 2, grep !$seen{$_}++, @list;

        But it is not pretty, or obvious. The first grep counts all occurences and only passes the first found entry to the next grep which will check to see how many were actually found.

        There must be a better way!

        - Cees

        No, that won't fly. You need
        $seen{$_}++ for @list; my @doubles = grep $seen{$_} == 2, @list;

        Makeshifts last the longest.

      Note that this is slightly broken as is. To be entirely correct, you have to say
      my (%seen, $seen_undef); my @uniq = grep defined() ? !$seen{$_}++ : !$seen_undef++, @list;
      Of course, if you're fiddling with objects which cannot be compared for equity by stringification, it is still broken.

      Makeshifts last the longest.

        Of course, if you're fiddling with objects which cannot be compared for equity by stringification, it is still broken.

        Well, so is every uniquification based on the keys of a hash! The point still stands that grep is faster than the original idiom presented, by orders of magnitude, if still as memory-hungry.

        ------
        We are the carpenters and bricklayers of the Information Age.

        The idea is a little like C++ templates, except not quite so brain-meltingly complicated. -- TheDamian, Exegesis 6

        Please remember that I'm crufty and crochety. All opinions are purely mine and all code is untested, unless otherwise specified.

Re: Perl Idioms Explained - keys %{{map{$_=>1}@list}}
by dragonchild (Archbishop) on Aug 04, 2003 at 14:06 UTC
    Along similar lines, I will often create a hash of keys which are needed for lookup. For example, I will do:
    my %acceptable_server = map { $_ => 1 } qw( DEV QA PROD );
    It's a nice way of working with a changing list of static acceptable things.

    ------
    We are the carpenters and bricklayers of the Information Age.

    The idea is a little like C++ templates, except not quite so brain-meltingly complicated. -- TheDamian, Exegesis 6

    Please remember that I'm crufty and crochety. All opinions are purely mine and all code is untested, unless otherwise specified.

Re: Perl Idioms Explained - keys %{{map{$_=>1}@list}}
by Aristotle (Chancellor) on Aug 04, 2003 at 20:18 UTC
    Just as memory hungry, still not order preserving, and still broken with regards to undefined values, but faster approach:
    my %seen; @seen{@list}=(); my @uniq = keys %seen;
    And if you want to wrap it up without "leaking" lexicals:
    my @uniq = do { my %seen; @seen{@list}=(); keys %seen; };
    This approach does not (explicitly) iterate - more of the data structure setup work is done implicitly by perl.

    Makeshifts last the longest.

        undef is not documented to work on slices. That command should probably generate at least a warning.

        Caution: Contents may have been coded under pressure.

      This actually seems to be slightly quicker than the grep based solution:

      Benchmark: timing 100000 iterations of broquaint, grep_seen, aristotle +, juerd... broquaint: 25 wallclock secs (21.62 usr + 0.01 sys = 21.63 CPU) @ 46 +22.99/s (n=100000) grep_seen: 12 wallclock secs (11.55 usr + 0.00 sys = 11.55 CPU) @ 86 +60.26/s (n=100000) aristotle: 10 wallclock secs ( 9.05 usr + 0.00 sys = 9.05 CPU) @ 11 +044.84/s (n=100000) juerd: 9 wallclock secs ( 8.66 usr + 0.00 sys = 8.66 CPU) @ 11 +543.35/s (n=100000)
      on
      my @list = qw/a b c d d a e b a b d e f a erti wen udfgn wei sdej usdjdh iu t k a b b b b a a a a c d f e f s r s gkl h u s y t d s w s d log u log ds/;

      Jenda
      Always code as if the guy who ends up maintaining your code will be a violent psychopath who knows where you live.
         -- Rick Osborne

      Edit by castaway: Closed small tag in signature

        Yep. As I said, it does not iterate on the Perl side. The pertinent nodes in optree are only ever seen once, and the loop is implicit in the execution of that single op. With all other approaches, the pertinent ops have to be dispatched and executed as many times as there are items in the list.

        In fact, the keys %{{map{$_=>1}@list}} approach needs to first iterate over all elements of @list for the map, building a list as it goes, then build another list for keys.

        In contrast, the @uniq = grep ! $seen{$_}++, @list; approach only builds a list once, while iterating over the elements of @list.

        And of course as already explained, undef @seen{@list}; @uniq = keys %seen; just builds a single list - without iterating at all.

        Makeshifts last the longest.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlmeditation [id://280658]
Approved by adrianh
Front-paged by Courage
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others exploiting the Monastery: (5)
As of 2024-03-19 08:40 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found