Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic

comment on

( #3333=superdoc: print w/replies, xml ) Need Help??

use Benchmark 'cmpthese'; use List::Util 'sum'; our %hash = ( one => 1, two => 2, three => 3, four => 4, five => 5, six => 6, seven => 7, eight => 8, nine => 9, ten => 10, ); our @array = ( 1 .. 10 ); use constant { ONE => 0, TWO => 1, THREE => 2, FOUR => 3, FIVE => 4, SIX => 5, SEVEN => 6, EIGHT => 7, NINE => 8, TEN => 9, }; cmpthese ( -5, { hash => q/ my $n = sum( @hash{ 'one', 'two', 'three', 'four', 'five', 'six', 'seven', 'eight', 'nine', 'ten' } ); /, array => q/ my $n = sum( @array[ ONE, TWO, THREE, FOUR, FIVE, SIX, SEVEN, EIGHT, NINE, TEN ] ); /, } );


Rate hash array hash 1673833/s -- -46% array 3103809/s 85% --

I've got the 2nd and 6th edition of Learning Perl, and haven't been able to find anywhere in either edition where the book states that the efficiency advantages of arrays versus hashes manifest themselves when using shift/unshift/push//pop. Nor have I found anywhere in the book that states that a hash lookup in Perl costs the same as an array lookup. If you know of somewhere in that book I should be reading, please provide the exact quote, along with which edition I might find it in.

I think you're probably referring to this paragraph:

Some implementations of hashes (such as in the original awk language, where Larry borrowed the idea from) slow down as the hashes get larger and larger. This is not the case in Perl—it has a good, efficient, scalable algorithm. So, if a hash has only three key-value pairs, it’s very quick to “reach into the barrel” and pull out any one of those. If the hash has three million key-value pairs, it should be just about as quick to pull out any one of those. A big hash is nothing to fear.

That paragraph makes an assertion that the computational complexity of hash inserts or lookups is approximately constant, as the size of the hash grows toward infinity. And the same could be said of arrays. But order of growth in computational complexity says nothing about the amount of computational complexity there is that stays constant as the data set grows.

To understand this concept, look at the following two simple scenarios:

  • Find the numeric value 100 in an unsorted array of unique integers: We know that as the array grows, the growth in computational complexity is linear.
  • Find the string "David" in an unsorted array of strings.: We know that as the array grows, the growth in computational complexity is linear.

So those are both O(n) searches. However, to find numeric 100, we do a series of numeric comparisons while iterating over the elements of the array. To find "David", we must go to the first element and see if the first character in the first element is a "D". If it is, then we compare the second character and see if it is an "a". Each letter of the string results in another char comparison... essentially an integer comparison for each character of the string (short-circuiting as soon as there's a mismatch)... repeated for each element in the data set.

So it's pretty easy to see in this example that the string comparisons have an overhead per element that doesn't exist for the numeric comparisons.

Now look at what goes on in hash lookups versus array lookups. To look up a single element in an array, internally, Perl must first start with the base address of the array. Then it must add an integral offset to the array's base address. This offset will be the element number times the size of the element's structure. This is a simple mathematical calculation; base + offset * size. Once that calculation is performed, Perl can dereference a pointer contained in the element, and fetch the element's scalar entity.

Now for a Perl hash. First Perl must examine the string of some arbitrary length, and using a hashing algorithm to compute the bucket. That bucket may contain more than one element in the form of a linked list. The linked list would then be walked to arrive at the specific element being searched for. Fortunately the linked list chains are always kept pretty short, so we don't worry about their growth. But now instead of a simple base+offset*size equation, we feed a (usually) multi-byte string to a hashing algorithm, that gives us an offset, that leads us to a short chain of elements to scan through.

So the array lookup has some constant overhead, and the hash lookup has some constant overhead (overhead that doesn't change as the size of the structure grows). But the amount of constant overhead for the array is smaller than the amount of constant overhead for the hash.

We mostly don't worry about that. If we were looking for bleeding edge performance we would be using C or C++ to begin with. But the point to my mention of efficiency in this case was this: Sometimes function tables are used inside of tight loops. And infrequently, those tight loops become a performance problem. In such cases, switching from a hash based dispatch table to an array one will bring some performance improvement.

The more important point is that if one needs to assure true "switch-like" behavior, where the order of comparisons is predictable, an array is more appropriate; a hash based dispatch table doesn't guarantee a predictable order in which cases will be tested for a match.


In reply to Re^3: Given When Syntax by davido
in thread Given When Syntax by Deep_Plaid

Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":

  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?

    What's my password?
    Create A New User
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others drinking their drinks and smoking their pipes about the Monastery: (4)
    As of 2020-09-19 09:17 GMT
    Find Nodes?
      Voting Booth?
      If at first I don’t succeed, I …

      Results (114 votes). Check out past polls.