Yesterday I posted a OT node to SoPW. This was the wrong
place to post such a node so it was rightfully deleted.
But I want to get everyones' opinnion about the wisdom
of declaring a Oracle column to represent SSN as varchar2(9).
Example:
create table OMG_BTW ( ... SSN varchar2(9), ... );
What happens if there is ever more that 999,999,999 people
with SSNs?
Would it be better to create the table like so ...
... or am just a worry wort that hasn't forgotten the nightmarecreate table OMG_BTW ( ... SSN NUMBER, ... );
of Y2K bugs. I guess it is important to know the current
population of the US which is 287,773,061
with the world population of 6,243,392,791 at the time of
this posting.
What I don't know is the rate at which the population is growing?
And what is better varchar2(9) or number?
Monks! Your country needs you please help!
|
---|
Replies are listed 'Best First'. | |
---|---|
Re: SSN's possible new Y2K problem?
by seattlejohn (Deacon) on Aug 13, 2002 at 18:36 UTC | |
Personally I would *avoid* using a numeric field for a couple of reasons. For one thing, there's no guarantee that SSNs will always be numeric. The SSA could decide that making them a combination of letters and numbers provides more room for growth (or error checking, or whatever) without having to extend the field length. Also, using a numeric type means your code has to worry about things like management of leading zeros and overflowing the range of integers. Even though it's called a social security "number", I would tend to think of it as a social security "identifier". Also, to get off on a bit of a tangent, cases exist even today where nine numeric digits is not enough. A decade ago I wrote a billing application used in doctors' offices and initially made the SSN = 9 digits assumption, only to discover that in medical records, there is often a convention of using an alphabetic suffix to distinguish between a man and a woman who are married, but only one of whom has a social security number. (For example, if the man is 123-45-6789 and has a wife without an SSN, her "SSN" entry had to be coded as 123-45-6789-B, if memory serves.) Yuck! | [reply] |
(tye)Re: SSN's possible new Y2K problem?
by tye (Sage) on Aug 13, 2002 at 19:05 UTC | |
Personally, I don't consider it good design to have a field named "SSN". Have a field for "Identifier" or even "ID Number" and make it varchar(30). Don't you want your clients to ever have non-U.S.A. customers? Don't you ever want to have clients from other than the U.S.A.? For example, I live in the U.S.A. but my insurance company does not even know what my SSN is nor the SSNs of any of my family members. I do this for philosophical reasons but also because it greatly reduces how many forms and people end up with my SSN, which I feel reduces my risk of becoming a victim of identity theft. My ID number for insurance is assigned by the insurance company and starts with an "N", not a digit (though it is only 9 characters long). I pick 30 for the length because I remember when New York did something rather stupid. They changed their drivers' license numbers to be an encoding of identifying characteristics plus a sequence number (the first digit designated your eye color, etc.). That resulted in a 21-character driver's license number. This turned out to be very inconvenient for my employer who was selling software with a 20-character field for ID Number (which was rather larger than required by the drivers' license numbers of any state prior). - tye (but my friends call me "Tye") | [reply] |
by ichimunki (Priest) on Aug 14, 2002 at 16:42 UTC | |
| [reply] |
Re: SSN's possible new Y2K problem?
by dws (Chancellor) on Aug 13, 2002 at 18:12 UTC | |
If they become 10 (or 11, or 12) digits, there will be a big data migration problem. A lot of apps will have to be recompiled. But, if social security numbers start being recyled, the mess will become much, much bigger. SSNs could no longer be used as unique identifiers (a risky proposition anyway), but would require data range logic, to answer questions like "who held this SSN on this date?". I'm leaving the problem alone until the Social Security Adminstration provides guidance. See How Social Security Numbers Work for some background.
| [reply] |
Re: SSN's possible new Y2K problem?
by clintp (Curate) on Aug 14, 2002 at 00:26 UTC | |
If you think about it, until recently not everyone needed a SSN. At some point, not everyone needed an SSN: those not eligible (foreign workers depending on status prior to 1986, 1981, 1977, etc..), all of those stay-at-home parents who never worked or filed prior to 1988, any wage earner who chose not to participate prior to 1962, children prior to 1988 (and unclaimed dependants even now), and so on. So the SSA has only issued 140 million SSN's over the past 70 years. Don't get caught in the mental trap of using the SSN against population as an ID number, it doesn't quite work that way. It's not 1:1, and not even 1:N where N is constant over the last 70 years. They've got plenty left for a while to go. Not another 70 years for sure, but for a while. | [reply] |
by demerphq (Chancellor) on Aug 14, 2002 at 13:25 UTC | |
Consider a more flexible design: two tables, one that contains the definitions of various form of ID (such as US-SSN CA-SIN UK-whatever ...) which is referenced by the table that stores the IDS, in other words your US centric SSN becomes a two part ID, the SSN itself and the type of ID that it is. Yes of course this would mean that your software becomes slightly more complex, but then again its market potential goes from 270Million to the worlds population of 6 billion. A small tradeoff I woudl say. As a developer in an international company, with a North American origin, I am constantly amazed and amused at how often my North American colleagues make design decisions that mean their software is utterly unflexible in other enviornments and is utterly mated to a fixed NA concept of how to do things (and correspondingly useless outside of those areas). Examples include area codes, phone numbers, addresses, billing regulations, tax codes and the like. Designing your software to be properly flexible for alternate operating enviornments can only be a plus. An example is this: at some point (2-5 years maybe?) all of you in NA are going to have to go through something horrible, a telephone renumbering. (the UK monks will remember their (two!) experiences with this in recent years) This will involve adding at least one digit to every North American phone number, most likely but not necessarily the area code (a common approach has been to do away with the useful and convenient but unscalable fixed length area codes and go with flexible length area codes, this minimizes the number fo people who have to relearn phone numbers, as only the area code changes). So ask yourself this, of the code that you have been involved in, how much would have to be rewritten if this was to happen? How much extra work would it have really required to design your software so that it was flexible in this regard from the very begining? And ask yourself the most important question: which would have been cheaper?
Yves / DeMerphq | [reply] |
by Abigail-II (Bishop) on Aug 14, 2002 at 14:59 UTC | |
To make a table as flexible as possible, you'd make every column BLOB, GBLOB or whatever your favourite database has. Then you can store anything in there, and lots and lots of it! But it's better to stop and think "why in the heck am I using a database in the first place"? Because you have valuable data. Data that might be more valuable than programmer time. People (and programs) need to access the data, trust the data to be correct, and they need to be able to access it fast. The stricter the better. If you have an SSN column of maximal 9 characters, it's not possible to stuff 10 chars in it - the database will reject it. You can put "gooble" in it, and that's why a numeric field would be better. Speed can also be an issue. The smaller a row is, the more fit in a page, and the faster access will be. Also, the smaller your database is, the faster backups (and not unimportant, restores) will be. Don't get the idea diskspace is cheap, sure a disk at your local PC store doesn't cost a lot, but if you have to house it, mirror it, backup it and pay the person doing all this, the cost no longer is insignificant. Flexibility is a trade off, and especially with database more flexibility can easily mean a significant price you have to pay in speed and size. Abigail | [reply] |
by Daruma (Curate) on Aug 15, 2002 at 06:28 UTC | |
First let me agree with Abigail-II: "Speed can also be an issue." Speed is a big part of what a database is all about. Certainly, the relational model gives us flexibility and reuse of data, but speed can be the cost. If we were to use a table to define the "ID number" types, we move towards a more normalized form. The problem is in overextending ourselves in this way. Normalization can be taken to the extreme. Moderation is a key them when deciding on the level and detail of normalization in a database. Many monks from other monasteries have preached the benefits of moderation. Many DBA's would include themselves in that bunch. Joins can become quite a task when every field must be validated against typing tables. Regarding SSN's: I think we still have a while before we reach the point of needing additional or different base numbers. Regardless, I've oft wondered at things like SSNs, IP Addresses, Credit Card Numbers, etc. When adding intelligence into the code, like Credit Card Numbers, we greatly reduce the number of possible variants. By the way... when will a pin be required for the use of my SSN? -Daruma | [reply] |
Re: SSN's possible new Y2K problem?
by rbc (Curate) on Aug 13, 2002 at 18:48 UTC | |
But what if instead of makeing ssn 10 digit's at some point off in the future the SSA decides to change the base of the number from 10 to say 16 (hex) then we would have SSN's with ABCDEF in them? Should hex values be declared as VARCHAR2 in databases? I guess that it doesn't matter if the column is VARCHAR2 or NUMBER. Geez I am cinfused now. | [reply] |
by demerphq (Chancellor) on Aug 14, 2002 at 13:01 UTC | |
BTW, base 36 = [A-Z0-9] Its nice to think that perhaps they might even go to base 62 or base 64, but the likelyhood of error mistaking a lowercase letter for an uppercase one (or vice-versa) makes me think that that is even less likely than hex.
Yves / DeMerphq | [reply] [d/l] |
by seattlejohn (Deacon) on Aug 16, 2002 at 23:08 UTC | |
| [reply] [d/l] |
Re: SSN's possible new Y2K problem?
by tommyw (Hermit) on Aug 14, 2002 at 11:22 UTC | |
Worrying about 1 billion SSNs isn't a problem with the storage arrangement: Oracle will allow you to alter the table to widen the column. alter table OMG_BTW modify (SSN varchar2(10)); will do exactly that. The problem you were having yesterday was due to the fact that you were trying to change the underlying datatype from varchar2 to number. You can't do that if there's any data in the column (I'm not sure you can actually do it even then). Similarly, you can't make a column narrower if there's any data in it (even if the data doesn't require the full width of the column). To perform such an act, you must ensure that all the values are null. The problem is if you start making assumptions in other places about the size of the data values: either declaring enough storage to only hold 9 digits, or to only display 9 digits. These will bite you hard when you suddenly need to squeeze that extra numeral in. And it will be sudden, no matter how much warning you had ;-) As to what type you should use, personally, I'd declare it as number(9, 0) to minimize the risk of getting invalid values in there, and reduce the amount of storage required. After all, you think it's storing a nine digit number, don't you? (Unless, as tye mentioned, you ought to be allowing for foreign values). -- | [reply] [d/l] [select] |
Re: SSN's possible new Y2K problem?
by tbone1 (Monsignor) on Aug 14, 2002 at 13:32 UTC | |
So anyway, what will happen is that the number of people in the US will be constant, or more likely grow slowly thanks to immigration. The number of SSNs will increase linearly with time, not exponentially. There is plenty of time for the SSA to worry about this. Although, from my experience with government contracting in particular and the bureaucratic mind in general, I don't expect them to do anything about it until things reach a crisis.
-- | [reply] |
Re: SSN's possible new Y2K problem?
by Steve_p (Priest) on Aug 14, 2002 at 10:48 UTC | |
Typically, you only want to declare something as a NUMBER if you are going to use it to perform math on the field. This is a way of sending a message to programmers that this field is essentially immutable for this record, at least without good reason. Second, who says that an SSN needs to be a number? By declaring it as alphanumeric, you would have much less chance for pain if all the Social Security numbers are used up. We would have a second Y2K if the government ever decided to switch, but it would be more likely that they would introduce characters to the SSN rather than try to enlarge the record. | [reply] |
Re: SSN's possible new Y2K problem?
by Stegalex (Chaplain) on Aug 14, 2002 at 13:17 UTC | |
- SSN's *are* recycled. - Even if they weren't, do you actually think that there will still be a Social Security sytem by the time they assign SSN 999-99-9999? ~~~~~~~~~~~~~~~ I like chicken. | [reply] |
by krisahoch (Deacon) on Aug 14, 2002 at 16:19 UTC | |
This is correct. They are recycled 25-50 years after a person's death (From SSN people here in Houston). Additionally if there are 1 billion active SSN's out there, then the 1 billonth would be alphanumeric 00a-000-0000 Stick with VARCHAR(9) --KristoferUpdated: Forgot my p tags | [reply] |
by seattlejohn (Deacon) on Aug 16, 2002 at 23:16 UTC | |
SSNs are not assigned sequentially, and certain digit sequences are considered invalid. The first three digits indicate in what state or administrative region the SSN was assigned, and the next two indicate in what group it was assigned. Here is an explanation. | [reply] |