Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

better array to hash conversion

by perltux (Beadle)
on Dec 11, 2012 at 12:18 UTC ( #1008288=perlquestion: print w/ replies, xml ) Need Help??
perltux has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I have an array and want to convert it into a hash where the array values become the keys and the array index numbers become the values of the hash.
The following code does this, but I wonder if there is a more elegant and/or efficient way of doing this?
(The content of the array could be anything, not necessarily the letter sequence I used in my example below)

my @array=qw(a b c d e f g h); my %hash; for (my $idx=0; $idx<@array; $idx++) { $hash{$array[$idx]} = $idx; }

Comment on better array to hash conversion
Download Code
Re: better array to hash conversion
by Anonymous Monk on Dec 11, 2012 at 12:23 UTC

    The following code does this, but I wonder if there is a more elegant and/or efficient way of doing this?

    Really? What search terms did you use to look?

    Did you benchmark?

     my %hash; @hash{ @array } = 0 .. $#array;

      I did do searches for "array to hash conversion" but all results came up with examples that use 'map' and did not seem to work for my specific conversion (the array values becoming the keys and the array index numbers becoming the values).

      Sorry for asking what might look like a dumb question to an expert, but my Perl skills are still limited.

      Anyway, thanks for your reply and thanks to everybody else who replied, too!

        Sorry for asking what might look like a dumb question to an expert, but my Perl skills are still limited.

        :) My intent was not to admonish, I was interested to know your search terms (you should share them if you searched) as this topic can be hard to find.

        I tried a bunch variations, I even resorted to ?node_id=3989;BIT=benchmark%20hash%7B, closest I found was Array to Hash, Using map to create a hash/bag from an array -- not your case exactly, but seems to cover all available syntax -- you could adapt it

Re: better array to hash conversion
by clueless newbie (Friar) on Dec 11, 2012 at 12:23 UTC
    @hash{@array}=(0..$#hash};

    Should really read

    @hash{@array}=(0..$#array};

    Thanks for the catch, perltux!

      Shouldn't that be $#array rather than $#hash ?

        Absolutely!

Re: better array to hash conversion
by karlgoethebier (Curate) on Dec 11, 2012 at 12:28 UTC

    Like this..

    use Data::Dumper; my @array = (a..z); my %hash = map { $array[$_] => $_ } 0..$#array; print Dumper(\%hash); __END__ $VAR1 = { 'w' => 22, 'r' => 17, 'a' => 0, 'x' => 23, 'd' => 3, 'j' => 9, 'y' => 24, 'u' => 20, 'k' => 10, 'h' => 7, 'g' => 6, 'f' => 5, 't' => 19, 'i' => 8, 'e' => 4, 'n' => 13, 'v' => 21, 'm' => 12, 's' => 18, 'l' => 11, 'c' => 2, 'p' => 15, 'q' => 16, 'b' => 1, 'z' => 25, 'o' => 14 };

    Update...i cheated this one ;-)

    @hash{@array} = 0..$#array;
    map

    Regards, Karl

    «The Crux of the Biscuit is the Apostrophe»

      Actually I do like your example with 'map' because it made me understand better how 'map' works (even though @hash{@array} = 0..$#array; is probably the better solution here).
Re: better array to hash conversion
by ruzam (Curate) on Dec 11, 2012 at 14:06 UTC

    A little bench-marking can be helpful. For the case of converting an array to a hash where the array values become keys and the array index become values.

    I've bench-marked the OP code as well as the variations presented in response, in addition I added my own solution to the problem (variation3). The clear winner is variation1.

    my %hash; @hash{ @array } = 0 .. $#array;
    #!/usr/bin/env perl use strict; use warnings; use Benchmark qw(:all); my @array=qw(a b c d e f g h); sub original { my %hash; for (my $idx=0; $idx<@array; $idx++) { $hash{$array[$idx]} = $idx;} } sub variation1 { my %hash; @hash{ @array } = 0 .. $#array; } sub variation2 { my %hash = map { $array[$_] => $_ } 0..$#array; } sub variation3 { my $idx = 0; my %hash = map { $_ => $idx++ } @array; } cmpthese(-10, { 'original' => sub{ original() }, 'variation1' => sub{ variation1() }, 'variation2' => sub{ variation2() }, 'variation3' => sub{ variation3() }, });
    results:
    Rate variation2 variation3 original variation1 variation2 142570/s -- -15% -35% -49% variation3 168018/s 18% -- -24% -40% original 220185/s 54% 31% -- -21% variation1 279147/s 96% 66% 27% --
      Many thanks for that. I find it very interesting that the 'for' loop is actually the second fastest solution, faster than the solutions using 'map'.

        A for solution is generally faster than a map, and it has been discussed in map versus for amongst other places. Sometimes however it is more expressive to use "map".

        A Monk aims to give answers to those who have none, and to learn from those who know more.

      As the array size grows, it doesn't take long for the OPs original to out pace variation1. It only requires a 200,000 or so for that to happen, and the benefits mount geometrically as the array size grows:

      #!/usr/bin/env perl use strict; use warnings; use Benchmark qw(:all); our @array = 'aaaa' .. 'lzzz'; print "$#array\n"; sub original { my %hash; for (my $idx=0; $idx<@array; $idx++) { $hash{$array[$idx]} = $idx;} } sub variation1 { my %hash; @hash{ @array } = 0 .. $#array; } sub variation2 { my %hash = map { $array[$_] => $_ } 0..$#array; } sub variation3 { my $idx = 0; my %hash = map { $_ => $idx++ } @array; } sub variation4 { my $idx = 0; my %hash; $hash{ $_ } = $idx++ for @array; } cmpthese -5, { 'original' => \&original, 'variation1' => \&variation1, 'variation2' => \&variation2, 'variation3' => \&variation3, 'variation4' => \&variation4, }; __END__ C:\test>junk91 210911 Rate variation2 variation3 variation1 original variatio +n4 variation2 2.08/s -- -2% -36% -38% -4 +2% variation3 2.12/s 2% -- -35% -37% -4 +1% variation1 3.26/s 57% 54% -- -3% - +9% original 3.37/s 62% 59% 3% -- - +6% variation4 3.57/s 72% 68% 9% 6% +--

      (I've added another variation that works better for large arrays.)


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

      RIP Neil Armstrong

        This is very interesting. Do you know why - and like to explain it?

        Thanks and best regards, Karl

        «The Crux of the Biscuit is the Apostrophe»

      presize/preallocate buckets , change the number of elements, change the perl version, and the winner changes :)

      funny bench :)

      #!/usr/bin/perl -- use strict; use warnings; use Benchmark qw/ cmpthese /; #~ my $cmptime = -abs(shift); #~ my $cmptime = -3; my $cmptime = -10; printf "%10s cmptime %s\n", $], $cmptime; my @array; for my $power ( 8, 16,18 ) { #~ for my $power ( 8, 16, 20, 24 ){ @array = 1 .. ( 2**$power ); print "2 ** $power == @{[int @array ]}\n"; cmpthese( $cmptime, { map { ; $_ => __PACKAGE__->can( $_ ) } qw/ ffor fforc ifor imap mmap sffor sfforc sifor simap slic smap simap2 / } ); } exit; sub mmap { my %hash = map { ; $array[$_] => $_ } 0 .. $#array; (); } sub imap { my $ix = 0; my %hash = map { ; $_ => $ix++ } @array; (); } sub slic { my %hash; @hash{@array} = 0 .. $#array; (); } sub ffor { my %hash; for my $ix ( 0 .. $#array ) { $hash{ $array[$ix] } = $ix; } (); } sub fforc { my %hash; for( my $ix = 0 ; $ix < @array ; $ix++ ) { $hash{ $array[$ix] } = +$ix; } (); } sub ifor { my %hash; my $ix = 0; $hash{$_} = $ix++ for @array; (); } ## preSize sub sffor { my %hash; keys( %hash ) = int @array; for my $ix ( 0 .. $#array ) { $hash{ $array[$ix] } = $ix; } (); } sub sfforc { my %hash; keys( %hash ) = int @array; for( my $ix = 0 ; $ix < @array ; $ix++ ) { $hash{ $array[$ix] } = +$ix; } (); } sub sifor { my %hash; my $ix = 0; keys( %hash ) = int @array; $hash{$_} = $ix++ for @array; (); } sub smap { my %hash; keys( %hash ) = int @array; @hash{@array} = 0 .. $#array; (); } sub simap { my $i = 0; my %hash; keys( %hash ) = int @array; %hash = map { $_ => $i++ } @array; (); } sub simap2 { my $i = 0; my %hash; keys( %hash ) = int @array; %hash = map(( $_ => $i++ ), @array); (); } __END__

      funny results :)

      5.008008 c -10 2 ** 8 == 256 Rate simap2 simap2 2721/s simap 2770/s mmap 3196/s imap 3505/s fforc 3981/s sffor 4335/s sfforc 4535/s ffor 4838/s ifor 5027/s sifor 5798/s smap 8123/s slic 9384/s 5.008008 c -10 2 ** 16 == 65536 Rate simap2 simap2 3.82/s simap 3.83/s mmap 4.53/s imap 5.26/s sffor 7.86/s sfforc 7.89/s fforc 8.09/s smap 8.46/s sifor 8.68/s ffor 8.83/s ifor 8.93/s slic 9.47/s 5.008008 c -10 2 ** 18 == 2621443 Rate simap2 simap2 0.810/s simap 0.825/s mmap 0.979/s imap 1.18/s smap 1.68/s sfforc 1.75/s sffor 1.80/s sifor 1.90/s fforc 1.90/s slic 1.93/s ffor 2.08/s ifor 2.10/s
      5.008009 c -10 2 ** 8 == 256 Rate simap simap 1767/s simap2 1775/s mmap 2017/s imap 2183/s fforc 2980/s sffor 3326/s ffor 3389/s ifor 3565/s sfforc 4920/s sifor 6445/s smap 8908/s slic 9698/s 5.008009 c -10 2 ** 16 == 65536 Rate mmap mmap 2.36/s simap2 2.37/s simap 2.37/s imap 2.51/s sffor 5.77/s fforc 6.03/s ffor 6.42/s ifor 6.53/s sfforc 7.95/s smap 8.75/s sifor 8.83/s slic 9.82/s 5.008009 c -10 2 ** 18 == 2621442 **too few iter Rate imap imap 0.187/s mmap 0.259/s simap2 0.298/s simap 0.302/s sffor 1.44/s fforc 1.50/s ffor 1.60/s ifor 1.63/s smap 1.68/s sfforc 1.75/s slic 1.91/s sifor 1.92/s
      5.012002 c -10 2 ** 8 == 256 Rate mmap mmap 1364/s imap 1456/s simap 1734/s simap2 1742/s fforc 3086/s sffor 3224/s ffor 3479/s ifor 3551/s sfforc 5073/s sifor 6343/s smap 8775/s slic 9969/s 5.012002 c -10 2 ** 16 == 65536 Rate mmap mmap 1.99/s imap 2.01/s simap 3.10/s simap2 3.11/s sffor 6.63/s fforc 6.81/s ffor 7.27/s ifor 7.35/s sfforc 9.58/s sifor 10.5/s smap 10.6/s slic 12.1/s 5.012002 c -10 2 ** 18 == 2621442 **too few iter Rate imap imap 0.211/s mmap 0.242/s simap2 0.406/s simap 0.409/s sffor 1.67/s fforc 1.75/s ffor 1.85/s ifor 1.89/s sfforc 2.11/s smap 2.11/s sifor 2.32/s slic 2.46/s
      5.014001 c -10 2 ** 8 == 256 Rate mmap mmap 2056/s imap 2163/s simap 2167/s simap2 2191/s sfforc 3280/s fforc 3339/s sffor 3799/s ffor 3854/s sifor 3880/s ifor 3897/s smap 4659/s slic 4748/s 5.014001 c -10 2 ** 16 == 65536 Rate imap imap 2.07/s mmap 2.44/s simap2 2.59/s simap 2.60/s sfforc 7.01/s fforc 7.13/s sifor 7.58/s sffor 7.63/s slic 7.72/s smap 7.74/s ffor 7.75/s ifor 7.81/s 5.014001 c -10 2 ** 18 == 2621440 **too few iter **too few iter **too few iter Rate imap imap 0.156/s mmap 0.173/s simap2 0.177/s simap 0.177/s sfforc 1.62/s fforc 1.63/s slic 1.67/s smap 1.67/s sifor 1.75/s sffor 1.76/s ffor 1.77/s ifor 1.79/s
      5.016001 c -10 2 ** 8 == 256 Rate mmap mmap 2227/s simap 2282/s simap2 2283/s imap 2321/s sfforc 3331/s fforc 3382/s sffor 3484/s ffor 3575/s sifor 3772/s ifor 3831/s smap 4733/s slic 4829/s 5.016001 c -10 2 ** 16 == 65536 Rate imap imap 2.06/s mmap 2.48/s simap 2.59/s simap2 2.65/s sfforc 6.90/s smap 6.92/s slic 7.01/s sffor 7.07/s fforc 7.12/s ffor 7.33/s sifor 7.43/s ifor 7.69/s 5.016001 c -10 2 ** 18 == 2621440 **too few iter **too few iter **too few iter Rate imap imap 0.156/s mmap 0.176/s simap2 0.177/s simap 0.178/s smap 1.45/s slic 1.47/s sfforc 1.60/s fforc 1.63/s sffor 1.65/s ffor 1.68/s sifor 1.70/s ifor 1.75/s

      This is premature optimisation gone mad! For most purposes for speed to be a factor in this decision the array would need to contain of the order of 1 million entries.

      Sure, benchmarks are fun to write (although often hard to make meaningful), but the overwhelming criteria in this sort of coding decision is clarity and maintainability of the code. By that metric on both counts 'original' is way down the list. I'd go for variation 1 or a for modifier version of 'original', either of which is clear, succinct and not particularly prone to coding errors.

      True laziness is hard work
        This is premature optimisation gone mad! For most purposes for speed to be a factor in this decision the array would need to contain of the order of 1 million entries.

        Actually, significant differences start at around 200,000; and as someone who regularly does similar processing with 10s and even 100s of millions, knowing what works quickest whilst avoiding unnecessary memory growth is important.

        And unless you have some psychic insight to the OPs application, you have nothing on which to base your conclusions, so it is they that are "premature".


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

        RIP Neil Armstrong

Re: better array to hash conversion
by Anonymous Monk on Dec 11, 2012 at 15:14 UTC

    While the hash slice method is the most idiomatic, here's one more variation, based on the OP's code -- just taking out the C-like for loop:

    $hash{ $array[$_] } = $_ for (0..$#array);

    And something that works only after 5.12 but is very easy to read:

    use 5.012; while (my ($idx, $val) = each(@array)) { $hash{$val} = $idx; }
Re: better array to hash conversion
by davido (Archbishop) on Dec 11, 2012 at 18:41 UTC

    Another way of looking at the problem:

    When you convert the array to a hash with keys as the array's values, and values as the array's indices, you pay the price for conversion once. Your subsequent lookups will be quite fast. But you do pay for it; the overhead of the hashing algorithm, combined with the O(n) time complexity of converting the entire array to a hash.

    On the other hand, if all you're interested in is an occasional search that yields an index, you could use List::MoreUtils first_index function:

    use List::MoreUtils 'first_index'; my @array = qw( a b c d e f g h ); my $found_ix = first_index{ $_ eq 'd' } @array; print "Found 'd' at $found_ix.\n"; __END__ output: Found 'd' at 3.

    This avoids the one-time overhead of generating hash keys for the entire structure, and the per-search overhead of hash lookups. But now every lookup will be an O(n) operation. If you're doing a lot of lookups this is a net loss. If you're doing few lookups, it could be a win, which would have to be verified via benchmarking.

    One nice thing about the first_index method is that its semantics are pretty clear. But if you're doing frequent lookups your original idea of using a hash lookup is good.


    Dave

      Thank you very much and best regards, Karl

      «The Crux of the Biscuit is the Apostrophe»

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1008288]
Approved by bart
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others imbibing at the Monastery: (16)
As of 2014-08-20 18:48 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The best computer themed movie is:











    Results (121 votes), past polls