|Perl: the Markov chain saw|
An alternate technique for database primary keysby sundialsvc4 (Abbot)
|on Nov 05, 2010 at 01:40 UTC||Need Help??|
Every relational database table should, of course, have a primary key. This should be a column whose only purpose in life is to uniquely identify the row, and this strictly for the database’s own, internal purposes. It should be a column whose values are utterly devoid of business or other meaning.
Every DBMS provides one way to do this: the “auto-increment integer.” When you insert a new row, this column will automatically be initialized with a value that is supposed to be unique. It can therefore be used as a primary key.
Sometimes you have no practical choice but to use this. If you are inserting rows using an INSERT...FROM statement, then this might be the only way to get the job done. But if you are inserting rows programmatically, here’s another approach that worked really well for me.
My application has a subroutine which will generate an 8-character random string of letters and numbers, prefixed by an optional string provided by the caller. (The algorithm is a very well-known “Perl one-liner,” therefore not shown here.) The choice of “8 characters’ is arbitrary, but it should simply be long enough that a “collision” is impossible. The domain of letters is chosen so that the strings won’t contain any troublesome punctuation-marks. The prefix makes it readable to humans. (For instance, you can guess that dataset_1ap6GoWR probably refers to a “dataset” of some kind...) I call these things monikers.
The advantage of this approach is that “every integer looks exactly like every other integer, but monikers never, ever will.” If the wrong kind of moniker is in some column, you can immediately detect it by eye. Or, by a query, e.g.:
select dataset_moniker from datasets where dataset_moniker not like 'dataset_%'
(Footnote: This is a great application for any of the Test modules in CPAN ...)
My application also features an object called Filer. Every database manipulation that is done, is done by and through the Filer. Every question that is asked, and every change that is done, is accomplished by means of Filer methods (or by other “intelligent” program objects which rely exclusively upon Filer for their database-related services). As this application has continued to develop (it is a valiant and successful attempt to automate program-understanding of a legacy application that issues over 11,000 separate queries against more than 860 tables across a source-code library of 3,600+ DB2, Oracle and SAS programs ... all of them automatically discovered, parsed, analyzed and cataloged by this app ...) its data structures have changed many times. But the Filer is the only one who has to know.
As is usual for me, I didn’t know quite what I was getting into when I said that this app could be built in Perl. But it has been successful, and I thought I’d share the Meditation. HTH!™