|Perl: the Markov chain saw|
Data structure challengeby Abigail-II (Bishop)
|on Mar 17, 2004 at 17:32 UTC||Need Help??|
IntroductionIt's commonly known that a hash is an appropriate datastructure if you want to store and modify a set of strings, and you want to perform matching queries, that is, quickly answer the question whether something is an element of the stored set.
There are some drawbacks about hashes. Accessing a key takes time, as we need to calculate the hash function. Furthermore, many elements could hash to the same address, which causes chains, and which slow down queries (and deletes). Furthermore, 'clearing' a hash takes time linear in the size of the stored set. (Some of the problems have been fixed in recent perls, but with some overhead when inserting elements). Implementation is also non-trivial.
If we know something about the set we are storing, we can sometimes use datastructures that are faster. For instance, suppose we know that all possible elements of the set are non-negative integers smaller than a fixed number U. We can then build an array (or a vec string) with U elements, say A, and we let i be a member of our set if and only if A [i] == 1. Otherwise, A [i] == 0. Using simple code, we have something like:
It's easy to see that inserting, deleting and querying all take O (1) time. Unfortunally initializing or clearing the structure takes time &Theta (U).
A special datastructureBut we can do better. Consider the following arrays A and B, and an integer C:
Now, a number n is in our query set, if and only if, all of the following conditions hold:
Now, how do we insert? Suppose we want to insert a number i. If it's already in the set (that is, the conditions mentioned above are fullfilled), nothing needs to be done. Otherwise, we store i in B [C], we store C in A [i], and we increment C.
Deletion is a bit trickerier, but still straightforward. Suppose we want to delete i. We do that by moving B [C - 1] to B [A [i]], as follows: first B [A [i]] = B [C - 1], then A [B [C - 1]] = A [i]. And finally we decrement C.
Clearing the set is simple, all one needs to do is to set C to 0.
So, we can do querying, inserting, deleting, and clearing in O (1) time. Not bad. But what about initializing? Well, that can be done in constant time as well. At least, in low level languages. The key is that we start off with C being 0, so no matter what is already in A or B, it never causes elements to be present. So, we do not need to initialize the arrays. As long as we can claim memory of appropriate size.
Hence, we can do all five operation in constant time - as long as the programming language doesn't insist on initializing.
ChallengeCan we create a similar datastructure in Perl? XS of course doesn't count.