Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

Re: Using foreach to process a hash

by DigitalKitty (Parson)
on Oct 22, 2006 at 06:03 UTC ( #579834=note: print w/ replies, xml ) Need Help??


in reply to Using foreach to process a hash

Hi all.

Welcome to the monastery iridius. I took the liberty of including two sources of information you might find beneficial:

DevShed Perl articles

Beginning Perl

The latter is a free online text (written for v5.6.1) that was designed to educate the beginning perl programmer. Feel free to download any chapter you'd like (or all of them).

:)

Concerning your question, jdporter answered it but I felt obligated to contribute. If you're processing a hash of considerable size, obviously you'd prefer to use the most efficient functions possible. In order to accomplish this end, the standard perl library includes a module called Benchmark (please see below for an example):

#!/usr/bin/perl use warnings; use strict; use Benchmark; timethese(1_000_000, { foreach_loop => \&foreach_loop, while_each_loop => \&while_each_loop } ); my %monks = ( jdporter => 'Prior', tye => 'Bishop', bobf => 'Vicar', planetscape => 'Vicar', belg4mit => 'Parson', ); sub foreach_loop { foreach my $key ( keys %monks ) { print "$key => $monks{$key}", "\n"; } } sub while_each_loop { while( my( $key, $value ) = each %monks ) { print "$key => $value", "\n"; } }

Output:



Benchmark: timing 1000000 iterations of foreach_loop, while_each_loop...

foreach_loop: 0 wallclock secs ( 1.16 usr + 0.00 sys = 1.16 CPU) @ 864304.24/s (n=1000000)
while_each_loop: 1 wallclock secs ( 0.51 usr + 0.00 sys = 0.51 CPU) @ 1941747.57/s (n=1000000)

In order to ascertain which function is more efficient, look at the output and you'll see: sys =n.nn CPU. The Benchmark module functions by executing your code as many times as the first parameter after the timethese() subroutine indicates then calculates an average based upon the amount of time it took. It then reports on the total amount of time taken. As one can see, the while_each loop executes faster and that would be a wise choice when iterating over an exceptionally large hash.

Update: In the event you see '(warning: too few iterations for a reliable count)' in the output, simply augment the number of code executions in the timethese() function.

2nd Update: Thanks jdporter and blazar. I rushed through this example and hadn't noticed the small bugs I had introduced by doing so.

Hope this helps,

~Katie


Comment on Re: Using foreach to process a hash
Download Code
Re^2: Using foreach to process a hash
by blazar (Canon) on Oct 22, 2006 at 13:30 UTC
    Concerning your question, jdporter answered it but I felt obligated to contribute. If you're processing a hash of considerable size, obviously you'd prefer to use the most efficient functions possible. In order to accomplish this end, the standard perl library includes a module called Benchmark (please see below for an example):

    I beg to differ. I've done similar interventions before and I know the subject is controversial, but I don't mind being downvoted. Don't misunderstand me: Benchmark.pm is great and I use it quite often, i.e. whenever I really need it. Indeed had the OP called for 'efficiency', it may have been a prefectly sensible answer. But whenever one calls for 'efficiency' one bell should ring, and often does: the very question is whether efficiency would be relevant at all in the situation under consideration. Sometimes it is, sometimes it's not. Actually in the latter case it often turns out to be yet another case of obsession for premature optimization which, we all know, is the root all evil in programming.

    Now the point is, I see a risk in pointing a newbie like iridius towards these issues: precisely the risk of generating or contributing to that obsession for premature optimization...

    OTOH your code has a bug:

    C:\temp>dk Benchmark: timing 1000000 iterations of foreach_loop, while_each_loop. +.. foreach_loop: 0 wallclock secs ( 1.08 usr + 0.00 sys = 1.08 CPU) @ +927643.78/ s (n=1000000) while_each_loop: 0 wallclock secs ( 0.50 usr + 0.00 sys = 0.50 CPU) + @ 2000000 .00/s (n=1000000) C:\temp>perl -w dk.pl Name "main::hash" used only once: possible typo at dk.pl line 17. Benchmark: timing 1000000 iterations of foreach_loop, while_each_loop. +.. foreach_loop: 2 wallclock secs ( 1.08 usr + 0.00 sys = 1.08 CPU) @ +927643.78/ s (n=1000000) while_each_loop: 1 wallclock secs ( 0.52 usr + 0.00 sys = 0.52 CPU) + @ 1941747 .57/s (n=1000000) C:\temp>perl -wMstrict dk.pl Global symbol "%hash" requires explicit package name at dk.pl line 17. Execution of dk.pl aborted due to compilation errors.

    Indeed you forgot to change $hash{$key} to $monks{$key} in code lifted from previous example, I suppose. In this case it doesn't alter excessively the results, but in others, a similar error may do, and in relevant manners.

    So I felt obligated to contribute too, but to the effect of reminding the OP that as the above clearly shows, just inserting

    use strict; use warnings;

    at the top of quite about all of his scripts is probably the best way to avoid many common programming mistakes. It just tells perl to give one all the help it can to avoid them.

    One last piece of advice I can give him, also inspired by this example, is to use descriptive names for his variables: e.g. $monk instead of $key and $rank instead of $value. The rationale being that if it doesn't make sense when mentally translated into English, then chances are it may be wrong...

Re^2: Using foreach to process a hash
by jdporter (Canon) on Oct 22, 2006 at 14:06 UTC

    Unfortunately, DigitalKitty's benchmark suffers from a couple of major flaws. Primarily, the subs under test are doing prints. That means that timing is going to be swamped by I/O. The second thing is that the hash is so small, that in the ops performed by each sub, the ones we're trying to test (each and keys) occur very few times, relative to the overhead of calling the sub, etc.

    So I offer the following benchmark, which eliminates both of those sources of error.

    use Benchmark; my @words = do { local @ARGV = ( 'mondo_word_list.txt' ); <> }; my %w; @w{@words} = @words; @words = keys %w; print scalar(keys %w), " words\n"; timethese( 10, { foreach_loop => \&foreach_loop, foreach_loop_novar => \&foreach_loop_novar, while_each_loop => \&while_each_loop, array => \&array, }); sub foreach_loop { for my $key ( keys %w ) { $a = $key; $b = $w{$key}; } } sub foreach_loop_novar { for ( keys %w ) { $a = $_; $b = $w{$_}; } } sub while_each_loop { while( my( $key, $val ) = each %w ) { $a = $key; $b = $val; } } sub array { for ( @words ) { $a = $_; $b = $_; } }

    Output: (slightly edited)

    311142 words foreach_loop: 7 wallclock secs ( 6.22 usr foreach_loop_novar: 6 wallclock secs ( 6.17 usr while_each_loop: 5 wallclock secs ( 5.13 usr array: 1 wallclock secs ( 1.08 usr

    As you can see, for large hashes, while each wins over for keys. And you also gain a little by using the default iterator on the for loop.

    We're building the house of the future together.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://579834]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others meditating upon the Monastery: (14)
As of 2014-10-23 19:24 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    For retirement, I am banking on:










    Results (128 votes), past polls