|XP is just a number|
Re: Why are hashes so much faster?by plaid (Chaplain)
|on Jun 18, 2000 at 01:53 UTC||Need Help??|
For a simple lookup at a given index, an array cannot be beaten for pure speed. I'm assuming what you were trying to do in the original script was check to see if a certain value was contained in an array. I will use a bit of terminology from lhoward's excellent post here, namely the big-O notation.
Hashing is done by taking the key of the hash, performing a "hashing" function on this key, and obtaining a number from it. Then, this number is used as an index into an array. A good hashing function will produce an array that is fairly well populated. For instance, if your key is 'name', a function will be called with the argument of 'name'. Some pseudocode might look like
The hashing_function is usually something that calculates a value based on the string, and performs a modulo (%) by the fixed size of the array. Then, for any subsequent lookups into the hash table, you run the key through the hashing_function again, and take whatever is at that index.
If more than one key hashes to the same index, usually an array is built off of that array, which is why hashes are -ideally- O(1), but not necessarily.
Array lookups are always O(1), and faster than hash lookups since they don't have to perform any hashing function beforehand. My guess is that you were having to do some traversing of the array to look for a certain value, in which case, your lookup function would be O(n), where n is the size of the array. By switching to a hash table, you were able to increase the speed of this lookup function. Other places where a hash would help would be in a delete function, as you could just get rid of the value, instead of having to worry about shifting all values down if the deleted index is in the middle of the array.
In general, hashes are much more efficient than arrays for anything that needs to have a certain relationship between index and its value, but arrays are much faster than hashes if all you want is a list of values where the index is unimportant.
Hope this all makes sense.