Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

Re: better array to hash conversion

by ruzam (Curate)
on Dec 11, 2012 at 14:06 UTC ( [id://1008307]=note: print w/replies, xml ) Need Help??


in reply to better array to hash conversion

A little bench-marking can be helpful. For the case of converting an array to a hash where the array values become keys and the array index become values.

I've bench-marked the OP code as well as the variations presented in response, in addition I added my own solution to the problem (variation3). The clear winner is variation1.

my %hash; @hash{ @array } = 0 .. $#array;
#!/usr/bin/env perl use strict; use warnings; use Benchmark qw(:all); my @array=qw(a b c d e f g h); sub original { my %hash; for (my $idx=0; $idx<@array; $idx++) { $hash{$array[$idx]} = $idx;} } sub variation1 { my %hash; @hash{ @array } = 0 .. $#array; } sub variation2 { my %hash = map { $array[$_] => $_ } 0..$#array; } sub variation3 { my $idx = 0; my %hash = map { $_ => $idx++ } @array; } cmpthese(-10, { 'original' => sub{ original() }, 'variation1' => sub{ variation1() }, 'variation2' => sub{ variation2() }, 'variation3' => sub{ variation3() }, });
results:
Rate variation2 variation3 original variation1 variation2 142570/s -- -15% -35% -49% variation3 168018/s 18% -- -24% -40% original 220185/s 54% 31% -- -21% variation1 279147/s 96% 66% 27% --

Replies are listed 'Best First'.
Re^2: better array to hash conversion
by BrowserUk (Patriarch) on Dec 11, 2012 at 14:41 UTC

    As the array size grows, it doesn't take long for the OPs original to out pace variation1. It only requires a 200,000 or so for that to happen, and the benefits mount geometrically as the array size grows:

    #!/usr/bin/env perl use strict; use warnings; use Benchmark qw(:all); our @array = 'aaaa' .. 'lzzz'; print "$#array\n"; sub original { my %hash; for (my $idx=0; $idx<@array; $idx++) { $hash{$array[$idx]} = $idx;} } sub variation1 { my %hash; @hash{ @array } = 0 .. $#array; } sub variation2 { my %hash = map { $array[$_] => $_ } 0..$#array; } sub variation3 { my $idx = 0; my %hash = map { $_ => $idx++ } @array; } sub variation4 { my $idx = 0; my %hash; $hash{ $_ } = $idx++ for @array; } cmpthese -5, { 'original' => \&original, 'variation1' => \&variation1, 'variation2' => \&variation2, 'variation3' => \&variation3, 'variation4' => \&variation4, }; __END__ C:\test>junk91 210911 Rate variation2 variation3 variation1 original variatio +n4 variation2 2.08/s -- -2% -36% -38% -4 +2% variation3 2.12/s 2% -- -35% -37% -4 +1% variation1 3.26/s 57% 54% -- -3% - +9% original 3.37/s 62% 59% 3% -- - +6% variation4 3.57/s 72% 68% 9% 6% +--

    (I've added another variation that works better for large arrays.)


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

    RIP Neil Armstrong

      This is very interesting. Do you know why - and like to explain it?

      Thanks and best regards, Karl

      «The Crux of the Biscuit is the Apostrophe»

        Because all of the map-based methods construct, copy and later discard, multiple, intermediary lists on their way to constructing the hash.

        And the costs of doing large numbers of small memory allocations and deallocations add up; especially if they memory manager has to go to the OS a couple of times to expand the process memory pool.

        Using for avoids constructing many of the intermediary lists.


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

        RIP Neil Armstrong

Re^2: better array to hash conversion
by GrandFather (Saint) on Dec 11, 2012 at 20:10 UTC

    This is premature optimisation gone mad! For most purposes for speed to be a factor in this decision the array would need to contain of the order of 1 million entries.

    Sure, benchmarks are fun to write (although often hard to make meaningful), but the overwhelming criteria in this sort of coding decision is clarity and maintainability of the code. By that metric on both counts 'original' is way down the list. I'd go for variation 1 or a for modifier version of 'original', either of which is clear, succinct and not particularly prone to coding errors.

    True laziness is hard work
      This is premature optimisation gone mad! For most purposes for speed to be a factor in this decision the array would need to contain of the order of 1 million entries.

      Actually, significant differences start at around 200,000; and as someone who regularly does similar processing with 10s and even 100s of millions, knowing what works quickest whilst avoiding unnecessary memory growth is important.

      And unless you have some psychic insight to the OPs application, you have nothing on which to base your conclusions, so it is they that are "premature".


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

      RIP Neil Armstrong

Re^2: better array to hash conversion
by perltux (Monk) on Dec 11, 2012 at 14:27 UTC
    Many thanks for that. I find it very interesting that the 'for' loop is actually the second fastest solution, faster than the solutions using 'map'.

      A for solution is generally faster than a map, and it has been discussed in map versus for amongst other places. Sometimes however it is more expressive to use "map".

      A Monk aims to give answers to those who have none, and to learn from those who know more.
Re^2: better array to hash conversion
by Anonymous Monk on Dec 11, 2012 at 19:09 UTC

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1008307]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others studying the Monastery: (6)
As of 2024-04-23 08:00 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found