Re: Perl Idioms Explained - keys %{{map{$_=>1}@list}}
by diotalevi (Canon) on Aug 04, 2003 at 13:23 UTC
|
Instead of using `=> 1`, it is valid and cheaper to use `=> undef`.
| [reply] |
|
| [reply] |
|
Yes. Instead of one scalar of the value '1' for every key, you get to share the same undef value for all the keys and thus don't have to allocate tons of memory you aren't going to use anyway. So keys @{{map { $_, undef } @ary}} has the potential to be a lot cheaper than keys @{{map { $_, 1 } @ary}}. The "a lot" part is just related to how many values are in your array. You burn one extra value for every entry in @ary and I'm just suggesting that you don't have to do that in this case.
| [reply] [d/l] [select] |
Re: Perl Idioms Explained - keys %{{map{$_=>1}@list}}
by Juerd (Abbot) on Aug 04, 2003 at 13:02 UTC
|
In the same category, to see if $bar is in @foo:
my %foo = map { $_ => 1 } @foo;
if ($foo{$bar}) {
...
}
or faster and using less memory:
my %foo;
undef @foo{@foo};
if (exists $foo{$bar}) {
...
}
Of course, for a single test, grep is easier and often faster:
if (grep $_ eq $bar, @foo) {
...
}
Juerd
# { site => 'juerd.nl', plp_site => 'plp.juerd.nl', do_not_use => 'spamtrap' }
| [reply] [d/l] [select] |
Re: Perl Idioms Explained - keys %{{map{$_=>1}@list}}
by thinker (Parson) on Aug 04, 2003 at 15:46 UTC
|
Hi broquaint,
or, to keep the order that the items are inserted (ie. the order in which the first instance of a value is encountered)
my %seen;
@uniq = grep ! $seen{$_}++, @list;
cheers
thinker
| [reply] [d/l] |
|
This is the way I've usually seen it done. I was curious, so I ran a benchmark against the two.
use Benchmark;
my @list;
for ( 0..9999 )
{
push @list, sprintf "%d", 100 * rand ;
}
timethese(
1000,
{
'keys_map' => sub { my @uniq = keys %{{ map {$_ => 1} @list }}
+; },
'grep_seen' => sub { my %seen; my @uniq = grep ! $seen{$_}+
++, @list; },
}
);
Yields the following output.
Benchmark: timing 1000 iterations of grep_seen, keys_map...
grep_seen: 13 wallclock secs (11.17 usr + 0.00 sys = 11.17 CPU) @ 89
+.52/s (n=1000)
keys_map: 30 wallclock secs (29.28 usr + 0.00 sys = 29.28 CPU) @ 34
+.15/s (n=1000)
| [reply] [d/l] [select] |
|
'keys_map_undef' => sub { my @uniq = keys %{{ map {$_ => undef} @list
+}}; },
to test the undef suggestion, it turns out to be 15-20% faster than using 1.
grep still wins, though.
--
Mike | [reply] [d/l] |
|
You can also grep lists for certain 'numbers' of entries (so if you wanted only the items that were in a list twice)
@doubles = grep ++$seen{$_} == 2, @list;
Jasper | [reply] [d/l] |
|
That would be 2 or more times right? Once ++$seen{$_} == 2 is true, you are immediatelly copying $_ to @doubles. So if it appeared a third time, you couldn't magically remove it again. Your code would be clearer if you used >= to emphasize that.
The following would probably do if you only wanted entries with exactly 2 occurences:
@doubles = grep $seen{$_} == 2, grep !$seen{$_}++, @list;
But it is not pretty, or obvious. The first grep counts all occurences and only passes the first found entry to the next grep which will check to see how many were actually found.
There must be a better way!
- Cees | [reply] [d/l] |
|
No, that won't fly. You need
$seen{$_}++ for @list;
my @doubles = grep $seen{$_} == 2, @list;
Makeshifts last the longest. | [reply] [d/l] |
|
Note that this is slightly broken as is. To be entirely correct, you have to say
my (%seen, $seen_undef);
my @uniq = grep defined() ? !$seen{$_}++ : !$seen_undef++, @list;
Of course, if you're fiddling with objects which cannot be compared for equity by stringification, it is still broken.
Makeshifts last the longest. | [reply] [d/l] |
|
Of course, if you're fiddling with objects which cannot be compared for equity by stringification, it is still broken.
Well, so is every uniquification based on the keys of a hash! The point still stands that grep is faster than the original idiom presented, by orders of magnitude, if still as memory-hungry.
------ We are the carpenters and bricklayers of the Information Age. The idea is a little like C++ templates, except not quite so brain-meltingly complicated. -- TheDamian, Exegesis 6 Please remember that I'm crufty and crochety. All opinions are purely mine and all code is untested, unless otherwise specified.
| [reply] |
|
Re: Perl Idioms Explained - keys %{{map{$_=>1}@list}}
by dragonchild (Archbishop) on Aug 04, 2003 at 14:06 UTC
|
Along similar lines, I will often create a hash of keys which are needed for lookup. For example, I will do:
my %acceptable_server = map { $_ => 1 } qw(
DEV
QA
PROD
);
It's a nice way of working with a changing list of static acceptable things.
------ We are the carpenters and bricklayers of the Information Age. The idea is a little like C++ templates, except not quite so brain-meltingly complicated. -- TheDamian, Exegesis 6 Please remember that I'm crufty and crochety. All opinions are purely mine and all code is untested, unless otherwise specified. | [reply] [d/l] |
Re: Perl Idioms Explained - keys %{{map{$_=>1}@list}}
by Aristotle (Chancellor) on Aug 04, 2003 at 20:18 UTC
|
Just as memory hungry, still not order preserving, and still broken with regards to undefined values, but faster approach:
my %seen;
@seen{@list}=();
my @uniq = keys %seen;
And if you want to wrap it up without "leaking" lexicals:
my @uniq = do {
my %seen;
@seen{@list}=();
keys %seen;
};
This approach does not (explicitly) iterate - more of the data structure setup work is done implicitly by perl.
Makeshifts last the longest. | [reply] [d/l] [select] |
|
@seen{@list}=();
A bit faster:
undef @seen{@list};
Juerd
# { site => 'juerd.nl', plp_site => 'plp.juerd.nl', do_not_use => 'spamtrap' }
| [reply] [d/l] |
|
| [reply] [d/l] |
|
|
|
|
Benchmark: timing 100000 iterations of broquaint, grep_seen, aristotle
+, juerd...
broquaint: 25 wallclock secs (21.62 usr + 0.01 sys = 21.63 CPU) @ 46
+22.99/s (n=100000)
grep_seen: 12 wallclock secs (11.55 usr + 0.00 sys = 11.55 CPU) @ 86
+60.26/s (n=100000)
aristotle: 10 wallclock secs ( 9.05 usr + 0.00 sys = 9.05 CPU) @ 11
+044.84/s (n=100000)
juerd: 9 wallclock secs ( 8.66 usr + 0.00 sys = 8.66 CPU) @ 11
+543.35/s (n=100000)
on my @list = qw/a b c d d a e b a b d e f a erti wen udfgn
wei sdej usdjdh iu t k a b b b b a a a a c d f e f s r s
gkl h u s y t d s w s d log u log ds/;
Jenda
Always code as if the guy who ends up maintaining your code
will be a violent psychopath who knows where you live.
-- Rick Osborne
Edit by castaway: Closed small tag in signature | [reply] [d/l] [select] |
|
Yep. As I said, it does not iterate on the Perl side. The pertinent nodes in optree are only ever seen once, and the loop is implicit in the execution of that single op. With all other approaches, the pertinent ops have to be dispatched and executed as many times as there are items in the list.
In fact, the keys %{{map{$_=>1}@list}} approach needs to first iterate over all elements of @list for the map, building a list as it goes, then build another list for keys.
In contrast, the @uniq = grep ! $seen{$_}++, @list; approach only builds a list once, while iterating over the elements of @list.
And of course as already explained, undef @seen{@list}; @uniq = keys %seen; just builds a single list - without iterating at all.
Makeshifts last the longest.
| [reply] [d/l] [select] |