http://www.perlmonks.org?node_id=1008327


in reply to Re^2: better array to hash conversion
in thread better array to hash conversion

This is very interesting. Do you know why - and like to explain it?

Thanks and best regards, Karl

«The Crux of the Biscuit is the Apostrophe»

Replies are listed 'Best First'.
Re^4: better array to hash conversion
by BrowserUk (Patriarch) on Dec 11, 2012 at 15:38 UTC

    Because all of the map-based methods construct, copy and later discard, multiple, intermediary lists on their way to constructing the hash.

    And the costs of doing large numbers of small memory allocations and deallocations add up; especially if they memory manager has to go to the OS a couple of times to expand the process memory pool.

    Using for avoids constructing many of the intermediary lists.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

    RIP Neil Armstrong

      The following map variant works out significantly faster than the previously posted map variants, though not as fast as for-loop based variants:

      sub variation3ep { my $idx = -1; my %hash = map(($_ => ++$idx), @array); }

      While the block form of map is often favoured for its clarity, the expression form of map is faster because it avoids the overhead of creating a lexical pad.

      This is actually another reason why BrowserUk's variation4 is fast - he uses the for statement modifier rather than a for loop with a block. Compare:

      #!/usr/bin/env perl use strict; use warnings; use Benchmark qw(:all); my @array='aa' .. 'zz'; sub variation4 { my $idx = 0; my %hash; $hash{ $_ } = $idx++ for @array; } sub variation5 { my $idx = 0; my %hash; for (@array) { $hash{ $_ } = $idx++; } } cmpthese(-3, { 'variation4' => \&variation4, 'variation5' => \&variation5, }); __END__ Rate variation5 variation4 variation5 740/s -- -4% variation4 773/s 5% --

      Yes, it's a small difference, but it's pretty consistently observable.

      The absolute fastest I've been able to achieve is a small variation on BrowserUk's variation4 using the prefix increment rather than postfix increment:

      sub variation6 { my $idx = -1; my %hash; $hash{ $_ } = ++$idx for @array; }

      It seems to give you about a 6% speed up.

      perl -E'sub Monkey::do{say$_,for@_,do{($monkey=[caller(0)]->[3])=~s{::}{ }and$monkey}}"Monkey say"->Monkey::do'
        s/(?=significantly)/in/

        In the unlikely event of having hundreds of thousands of entries, the run-time will be a fraction of a second faster (and nobody will notice or care). In the very unlikely event of this particular operation being even as much as 10% of the total run-time, the difference between the worst and the best shown so far will result in a 5% reduction in total run-time (which nobody will notice or care about).

        - tye        

        Thank you very much and best regards, Karl

        «The Crux of the Biscuit is the Apostrophe»

      Thank you very much and best regards, Karl

      «The Crux of the Biscuit is the Apostrophe»