|Problems? Is your data what you think it is?|
Re^3: Given When Syntaxby davido (Cardinal)
|on Mar 17, 2014 at 05:19 UTC||Need Help??|
I've got the 2nd and 6th edition of Learning Perl, and haven't been able to find anywhere in either edition where the book states that the efficiency advantages of arrays versus hashes manifest themselves when using shift/unshift/push//pop. Nor have I found anywhere in the book that states that a hash lookup in Perl costs the same as an array lookup. If you know of somewhere in that book I should be reading, please provide the exact quote, along with which edition I might find it in.
I think you're probably referring to this paragraph:
That paragraph makes an assertion that the computational complexity of hash inserts or lookups is approximately constant, as the size of the hash grows toward infinity. And the same could be said of arrays. But order of growth in computational complexity says nothing about the amount of computational complexity there is that stays constant as the data set grows.
To understand this concept, look at the following two simple scenarios:
So those are both O(n) searches. However, to find numeric 100, we do a series of numeric comparisons while iterating over the elements of the array. To find "David", we must go to the first element and see if the first character in the first element is a "D". If it is, then we compare the second character and see if it is an "a". Each letter of the string results in another char comparison... essentially an integer comparison for each character of the string (short-circuiting as soon as there's a mismatch)... repeated for each element in the data set.
So it's pretty easy to see in this example that the string comparisons have an overhead per element that doesn't exist for the numeric comparisons.
Now look at what goes on in hash lookups versus array lookups. To look up a single element in an array, internally, Perl must first start with the base address of the array. Then it must add an integral offset to the array's base address. This offset will be the element number times the size of the element's structure. This is a simple mathematical calculation; base + offset * size. Once that calculation is performed, Perl can dereference a pointer contained in the element, and fetch the element's scalar entity.
Now for a Perl hash. First Perl must examine the string of some arbitrary length, and using a hashing algorithm to compute the bucket. That bucket may contain more than one element in the form of a linked list. The linked list would then be walked to arrive at the specific element being searched for. Fortunately the linked list chains are always kept pretty short, so we don't worry about their growth. But now instead of a simple base+offset*size equation, we feed a (usually) multi-byte string to a hashing algorithm, that gives us an offset, that leads us to a short chain of elements to scan through.
So the array lookup has some constant overhead, and the hash lookup has some constant overhead (overhead that doesn't change as the size of the structure grows). But the amount of constant overhead for the array is smaller than the amount of constant overhead for the hash.
We mostly don't worry about that. If we were looking for bleeding edge performance we would be using C or C++ to begin with. But the point to my mention of efficiency in this case was this: Sometimes function tables are used inside of tight loops. And infrequently, those tight loops become a performance problem. In such cases, switching from a hash based dispatch table to an array one will bring some performance improvement.
The more important point is that if one needs to assure true "switch-like" behavior, where the order of comparisons is predictable, an array is more appropriate; a hash based dispatch table doesn't guarantee a predictable order in which cases will be tested for a match.