in reply to Re: Slowness when inserting into pre-extended array
in thread Slowness when inserting into pre-extended array
What I think you're running into is a basic shortcoming of hashes in general, and Perl in particular. What happens is that each hash key is internally "hashed" into a 4-byte integer value. And these keys are then put in a sorted list, so you can quickly perform a binary search on it to check whether the key already exists or not.
Ideally every different key should have a different hash value. Reality however shows that with huge amounts of keys, the chance of two different key values getting the same hash value, is actually quite large. This situation is then handled by creating a list of all the keys that have that same hash value. In some "worst case" scenarios, you might actually wind up with a single hash value for all keys given.
The search in the linked list is linear and therefore needs more and more time for each key added.
Now, the distribution of the hash keys in your dataset may be have some worst cases in it.
So how can this be fixed? Good question. You could try changing the keys by adding some other, fixed string to the keys. This should change the distribution of the hash key values, hopefully reducing the worst case.
If you're not stuck with the current Perl version, you might want to try out 5.8.1-RC1, as this has a new feature that will randomize the distribution of the hash keys. This in itself will not reduce the worst case scenario, but will make it a moving target, so you won't have that bad performance all of the time.
This randomization was introduced because the delay that you experienced, is actually a way to create a Denial of Service attack for CGI-scripts.
Hope this helps.
Liz