Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

Re: An improved technique for database primary keys

by herveus (Parson)
on Nov 05, 2010 at 12:54 UTC ( #869654=note: print w/ replies, xml ) Need Help??


in reply to An alternate technique for database primary keys

Howdy!

Every relational database table should, of course, have a primary key. This should be a column whose only purpose in life is to uniquely identify the row, and this strictly for the database’s own, internal purposes. It should be a column whose values are utterly devoid of business or other meaning.

I beg to differ. Every table should have a primary key. However, that key can (and often should) be comprised of multiple columns. What you describe here is the use of surrogate keys. Sometimes you have no real choice but to use a surrogate key, but they are frequently misapplied. Even if you do use a surrogate key, there should be a set of columns whose values identify the row. Those columns are the natural key. If any of those columns are permitted to be NULL, they can't be a primary key. The best you can do with them is make a UNIQUE index on them.

The risk with surrogate keys (especially if the natural key is ignored) is that you will get duplicate data. I've seen that in the wild. Use surrogate keys with care, if you must, but remember to identify the alternate key and enforce it.

Aside from that major quibble, I do like your idea.

yours,
Michael


Comment on Re: An improved technique for database primary keys
Re^2: An improved technique for database primary keys
by erix (Vicar) on Nov 05, 2010 at 13:34 UTC

      “Keyvil...”   :-D   I like it.   Thank you very much for the interesting links.

Reaped: Re^2: An improved technique for database primary keys
by NodeReaper (Curate) on Nov 05, 2010 at 13:34 UTC
Re^2: An improved technique for database primary keys
by sundialsvc4 (Abbot) on Nov 05, 2010 at 13:46 UTC

    You are correct:   these are, indeed, “surrogate” keys.

    “Constants aren’t.   Variables don’t.   Unique numbers aren’t (at least not if any human being has anything to do with it).   Even computer-generated “non-repeating” numbers sometimes do.   Plan accordingly.”

    As a general design principle, I routinely elect to use surrogate keys as the record-identifiers, i.e. for table-JOIN purposes, and to use UNIQUE indexes for any of the “natural” uses that you describe.

    I learned to do this when working on a very early project for an insurance company which had converted from an original paper-based provider-ID numbering system.   The same provider-ID was assigned to more than one provider, and providers had more than one ID.   Furthermore, business requirements made it impossible for this situation to be changed, because these numbers were widely-distributed in the field.   My strategy involved the use of surrogate keys.   These keys were “nonsensical character strings,” much like these, and they were never, ever divulged to the user.   The application stored the provider-ID number that had been entered, which was known to be ambiguous, and it resolved that ambiguity (by various rules and by user-input) and stored the corresponding (unambiguous) surrogate-key in another column.   All table linking took place using the surrogate columns.   This approach did successfully solve what was up to that time a very thorny business problem.

      Howdy!

      ...and you describe exactly the kind of situation where a surrogate key becomes necessary.

      I simply object to the terminology which implies that primary keys are, necessarily, opaque values stored in a single column. Changing the title to refer to surrogate keys would fix that.

      yours,
      Michael
Re^2: An improved technique for database primary keys
by Jenda (Abbot) on Nov 05, 2010 at 18:52 UTC

    Multicolumn primary keys are ... a major pita as soon as you need to do any joins. I had to write some queries for a schema that used them and it was horrible and error prone. It was way too easy to forget one of the three columns in the sixth join. Plus of course having to repeat all those columns in all those related tables is wasteful.

    If you want to ensure something is unique, use a unique index/constraint, but please do not ever even think of using a multicolumn PK!

    Update: I forgot to add ... unless the table is basically just implementation of an N-N relation. With or without additional columns. In that case you are unlikely to need to join using both columns at the same time (see ... you can't assume you'll always include the whole PK in all joins) and the surrogate ID would (almost) never be used.

    Jenda
    Enoch was right!
    Enjoy the last years of Rome.

      Howdy!

      It's a tradeoff. If you use surrogate keys, you always have to join to get the actual key values. Multicolumn PKs are natural. Surrogate keys obfuscate the true structure. Foreign key constraints are essential to document the key structures and to enforce referential integrity. They also help query building tools get the joins right.

      yours,
      Michael

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://869654]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chanting in the Monastery: (6)
As of 2014-09-21 13:17 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    How do you remember the number of days in each month?











    Results (171 votes), past polls