http://www.perlmonks.org?node_id=1008316


in reply to Re: better array to hash conversion
in thread better array to hash conversion

As the array size grows, it doesn't take long for the OPs original to out pace variation1. It only requires a 200,000 or so for that to happen, and the benefits mount geometrically as the array size grows:

#!/usr/bin/env perl use strict; use warnings; use Benchmark qw(:all); our @array = 'aaaa' .. 'lzzz'; print "$#array\n"; sub original { my %hash; for (my $idx=0; $idx<@array; $idx++) { $hash{$array[$idx]} = $idx;} } sub variation1 { my %hash; @hash{ @array } = 0 .. $#array; } sub variation2 { my %hash = map { $array[$_] => $_ } 0..$#array; } sub variation3 { my $idx = 0; my %hash = map { $_ => $idx++ } @array; } sub variation4 { my $idx = 0; my %hash; $hash{ $_ } = $idx++ for @array; } cmpthese -5, { 'original' => \&original, 'variation1' => \&variation1, 'variation2' => \&variation2, 'variation3' => \&variation3, 'variation4' => \&variation4, }; __END__ C:\test>junk91 210911 Rate variation2 variation3 variation1 original variatio +n4 variation2 2.08/s -- -2% -36% -38% -4 +2% variation3 2.12/s 2% -- -35% -37% -4 +1% variation1 3.26/s 57% 54% -- -3% - +9% original 3.37/s 62% 59% 3% -- - +6% variation4 3.57/s 72% 68% 9% 6% +--

(I've added another variation that works better for large arrays.)


With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

RIP Neil Armstrong

Replies are listed 'Best First'.
Re^3: better array to hash conversion
by karlgoethebier (Abbot) on Dec 11, 2012 at 15:30 UTC

    This is very interesting. Do you know why - and like to explain it?

    Thanks and best regards, Karl

    «The Crux of the Biscuit is the Apostrophe»

      Because all of the map-based methods construct, copy and later discard, multiple, intermediary lists on their way to constructing the hash.

      And the costs of doing large numbers of small memory allocations and deallocations add up; especially if they memory manager has to go to the OS a couple of times to expand the process memory pool.

      Using for avoids constructing many of the intermediary lists.


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

      RIP Neil Armstrong

        The following map variant works out significantly faster than the previously posted map variants, though not as fast as for-loop based variants:

        sub variation3ep { my $idx = -1; my %hash = map(($_ => ++$idx), @array); }

        While the block form of map is often favoured for its clarity, the expression form of map is faster because it avoids the overhead of creating a lexical pad.

        This is actually another reason why BrowserUk's variation4 is fast - he uses the for statement modifier rather than a for loop with a block. Compare:

        #!/usr/bin/env perl use strict; use warnings; use Benchmark qw(:all); my @array='aa' .. 'zz'; sub variation4 { my $idx = 0; my %hash; $hash{ $_ } = $idx++ for @array; } sub variation5 { my $idx = 0; my %hash; for (@array) { $hash{ $_ } = $idx++; } } cmpthese(-3, { 'variation4' => \&variation4, 'variation5' => \&variation5, }); __END__ Rate variation5 variation4 variation5 740/s -- -4% variation4 773/s 5% --

        Yes, it's a small difference, but it's pretty consistently observable.

        The absolute fastest I've been able to achieve is a small variation on BrowserUk's variation4 using the prefix increment rather than postfix increment:

        sub variation6 { my $idx = -1; my %hash; $hash{ $_ } = ++$idx for @array; }

        It seems to give you about a 6% speed up.

        perl -E'sub Monkey::do{say$_,for@_,do{($monkey=[caller(0)]->[3])=~s{::}{ }and$monkey}}"Monkey say"->Monkey::do'

        Thank you very much and best regards, Karl

        «The Crux of the Biscuit is the Apostrophe»