|Problems? Is your data what you think it is?|
Re: Maybe database tables aren't such great "objects," after all ...by ELISHEVA (Prior)
|on Mar 26, 2011 at 17:57 UTC||Need Help??|
The notion that we can just assume that object = table row or class = table is fundamentally flawed. At the simplest level, objects are about conceptual consolidation and classification. Tables are about deconstruction and normalization. These are very different things. In a well designed application, when there is a perfect one-to-one relationship between table architecture and object architecture it is almost always a special case.
Frequently, as time progresses and the number of use case stories and business rules expand or change to meet new needs, that apparent one-to-one relationship breaks down. My guess is that your work with this aging system made it obvious because you could see this time effect in action.
When we design databases we deliberately fracture our natural conceptions of "things" into a multitude of tables and rows. This fracturing is necessary so that each row of a table has certain mathematical properties. DBMS's rely on us translating our domain knowledge into an awareness of mathematical properties like functional dependence. So long as the rows of the database possess at least the properties of 3rd normal form, operations like inserts, deletes, validations, and joins are well behaved. They can be performed by a database engine that has absolutely 0 domain knowledge about the content of each row.
Objects are the exact opposite. They are meant to encapsulate domain specific knowledge and reglue together the fractured object. If they relate to anything in a database, it would be a row in a database view, rather than a table. A database view is also meant to capture a consolidated view of the data that reflects the way we think about and work with data in our normal on-going activities.
The difference between the two shows up both practically and theoretically. Take the classic "order form". To the database, an order form could be anywhere from 2 to hundreds of tuples spread across a minimum of two tables (order header, order line). Depending on the complexity of the data on the form, the data could be spread across many more tables (product tables, price tables, state tax tables, customer contracts, tables tracking special promotions, etc, etc). To the order taker, it is a single unit. The job of an application is to build a bridge between the user's view of an order - a single unit to review, print, approve, fulfill, and ship - and the database's view: a multitude of tuples with certain mathematical relationships to each other. How can an application translate between the two views if it insists on pretending they are synonymous?
On a more theoretical level there are also problems with presuming a one-to-one relationship between table rows and class objects. Suppose we have two collections of tuples A and B. Tuple like A contains attributes (a,b,c) and use 'a' as the primary key. Tuples like B contain attributes (a,b,c,d,e,f). They also use 'a' as the primary key. Attributes d,e,f are not-applicable to tuples like A.
From an object point of view we'd have two classes. Superclass A would store attributes (a,b,c). Subclass B would extend A and would add the attributes (d,e,f). Our application will have a mix of A and B objects since not all of our objects need (d,e,f).
Now look at the database. Depending on how concerned you are about wasted space and 4th/5th normal form, you will either have one or two tables. In the one table model there will be a single table with columns (a,b,c,d,e,f). Tuples like A will just have null values in (d,e,f). If you don't want to waste space or want to enforce 4th/5th normal form, you will likely have two tables: A with columns (a,b,c) and B with columns (a,d,e,f).
Now look at the object-table mapping. In neither case will you a simple one-to-one relationship between the rows in a table and the objects in a class. In the single table model you have one table and two classes. Some of the rows will be A object and some will be B objects.
You are no better off in the double table model. All the rows in the B table are B objects. Superficially it looks like the object model and the database model match - two classes and two tables. However, the A table still contains a mix of A and B objects. You will still need two queries or database views: one to select A objects from the A table and one to select B objects. Furthermore, we can't just take a row from the B table and create a B object. To get all the data needed to build the B object you have to join the A and B tables.
Now you could force a one-to-one relationship by defining an A table with attributes (a,b,c) and a B table with (a,b,c,d,e,f). However, now you've made it really hard to enforce key uniqueness. It can be done with triggers and custom programming but you can't use the built-in per-table uniqueness constraints because your list of already-used keys is split between two tables.
Fat models are no solution. From what I can tell on the internet a fat model moves validation of data from the database to the application. I can't even begin to say what a dumb idea this is if there is any possibility of that database being used by more than one application. Suppose you have five different applications all using the same database. If the validation rules are in the database then the rules are defined and maintained in a single place. If you move the validation outside of the database, you now have to replicate the validations in each of the five applications. What happens if the validation rules change? What about security?
If the rules are enforced in application code outside of the database, it is a lot easier to create a rogue application that intentionally corrupts or falsifies data. You could eliminate the need to change data validation rules in 5 different applications by insisting that all applications relate to the database via some sort of validation application. However, that is a much softer constraint than an in-database validation that refuses to save data and rolls back transactions when validations fail. Enforcing the rule requires a great deal of human code review effort. It can be done, but it is labor intensive. Rogue applications aren't going to use that layer and they are going to do their best not to get noticed and fed into the code review process.