note
plaid
For a simple lookup at a given index, an array cannot be
beaten for pure speed. I'm assuming what you were trying
to do in the original script was check to see if a certain
value was contained in an array. I will use a bit of
terminology from [lhoward]'s excellent post
<a href=http://perlmonks.org/index.pl?node_id=17890>here</a>,
namely the big-O notation.
<p>
Hashing is done by taking the key of the hash, performing a
"hashing" function on this key, and obtaining a number from it.
Then, this number is used as an index into an array. A good
hashing function will produce an array that is fairly well
populated. For instance, if your key is 'name', a function
will be called with the argument of 'name'. Some pseudocode
might look like
<code>
my $random = hashing_function('name');
$array_of_hash_values[$random] = $value;
</code>
The hashing_function is usually something that calculates a value
based on the string, and performs a modulo (%) by the fixed
size of the array. Then, for any subsequent lookups into the hash
table, you run the key through the hashing_function again, and take
whatever is at that index.
<code>
sub lookup_value {
my $index = hashing_function(shift());
return $array_of_hash_values[$index];
}
</code>
If more than one key hashes to the same index, usually an array is
built off of that array, which is why hashes are -ideally- O(1), but
not necessarily.
<p>
Array lookups are always O(1), and faster than hash lookups
since they don't have to perform any hashing function beforehand.
My guess is that you were having to do some traversing of the
array to look for a certain value, in which case, your lookup
function would be O(n), where n is the size of the array. By
switching to a hash table, you were able to increase the speed
of this lookup function. Other places where a hash would help
would be in a delete function, as you could just get rid of the
value, instead of having to worry about shifting all values down
if the deleted index is in the middle of the array.
<p>
In general, hashes are much more efficient than arrays for anything
that needs to have a certain relationship between index and its
value, but arrays are much faster than hashes if all you want is
a list of values where the index is unimportant.
<p>
Hope this all makes sense.
18682
18682