Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

Maybe database tables aren't such great "objects," after all ...

by sundialsvc4 (Abbot)
on Mar 25, 2011 at 16:35 UTC ( #895533=perlmeditation: print w/replies, xml ) Need Help??

I recently had to deal with some code that is filled with (home grown) code that implements every database table in the application as a Perl object.   It completely conceals the SQL.   Sounds like a great idea, no?   Well, maybe the answer really is, “no.”

Since this is a mature application, there are several readily-observable problems in the design that was selected (and as it evolved over a decade):

  1. To me, the concept really doesn’t provide clarity.   It actually makes it difficult to see where database calls are taking place, and what those calls consist of.   I can’t so easily “just look at” what is going to be given to the database engine.   I have to grep to find where the calls are taking place and in what situations.   Tracking of transaction-boundaries (and in code this old, there are only a few of those) is rather a source of consternation.
  2. Database-specific tests and decisions have been made all over the application, and although they use these table-accessor objects to get their data, the business logic is scattered hither and yon.   For example, if a statement such as if ($account_obj->get_type() eq 'C') ... occurs all over the application, well, it occurs all over the application.
  3. Invariably, the accessors didn’t go quite far enough to do everything, so there is still a lot of SQL string-building in the application anyway.   Two ways of doing things, and inevitable interference between the two.

So, am I slamming the idea of database-accessor classes?   Not quite.   My Meditation is, rather, that “a database table” is an obvious choice – but not, perhaps, the best choice – for what “a high-level object in an application” really ought to be.   Has this concept, in this application and as it was applied in this application, really brought a helpful benefit to the table?   I’m not persuaded.   What would have been better?   I listen fondly to the gently tinkling bells in this familiar sanctuary, and Meditate again.

“A database table,” like any and every other persistent data structure, really is just a mechanism to provide persistence.   An Accounts table is not “an account,” although it persists some data about “an account,” and does so both by its own columns and by its relationships to other tables.   An object that represents “an account” should not merely be a wrapper around an underlying data representation of a table and its columns, but a true, first-class object that is capable of answering questions, e.g. if ( $account->is_checking_account() ) ...   (Which logic, of course, would be used primarily by a Account::Checking superclass.)

If Perl classes are used as wrappers around tables (and I am not altogether rejecting that idea, as I have used it myself), the use of those wrappers should be confined to the implementation of high-level objects.   They should be low-level objects only.   Rigid assumptions about the present data representations – (“what does C mean in the account_type column, and is that column always named account_type and is the letter always C?”   “Welll-l-l, now that you mention it ...”   Heh.) – are things that very easily become pervasive throughout an application, to very destabilizing effect.   Using wrappers merely to get to that “letter C” is not, by itself, an improvement.

  • Comment on Maybe database tables aren't such great "objects," after all ...

Replies are listed 'Best First'.
Re: Maybe database tables aren't such great "objects," after all ...
by JavaFan (Canon) on Mar 25, 2011 at 20:15 UTC
    IMO, database abstraction is useful. But, mapping tables to Perl classes isn't any abstraction at all. It's exposing the details of your database layout to your program - quite the opposite of abstraction.

    I've seen the following design a few times, and it's my favourite:

    
        +------+  +------+  +------+
        | App. |  | App. |  | App. |  ....
        +------+  +------+  +------+
    
        +----------------------------+
        | Shared application library |
        +----------------------------+
    
        +----------------------------+
        | Database Interface Library |
        +----------------------------+
                                                     Application
    ------------------------------------------------------------
                                                     Database
        +--------------+  +--------------+
        | Stored Proc. |  | Stored Proc. |  ....
        +--------------+  +--------------+
    
        +-------+  +-------+  +-------+
        | Table |  | Table |  | Table | ....
        +-------+  +-------+  +-------+
    
    
    Each layer only talks to the layers directly above or below it. Perhaps overkill for a simple web application, and maybe not suitable for an agile environment, but the application not knowing anything about the table structure gives more freedom than one initially would think.
      I've seen the following design a few times, and it's my favourite:

      I could not agree more. SQL has no place in application code.

      And with Postgres' PL/Perl, you can move most of the heavy lifting into the database, close to the data, often avoiding huge comms overhead, whilst sticking with the powerful, intuative (procedural) language we know and love, for performing the manipulations.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

      While I might not go quite that far in many cases, I fully agree that 'table==object' is not abstraction. It's just convolution. In most database cases I've worked with, a single object might well have data in multiple tables. (With relations between each other.) Or an object could be made out of just part of a table. Thinking that an object is a table, and vice-versa, is just going to limit your idea of what can be done and what solutions are possible.

        I'm not experienced in using ORMs or the architectures being discussed here, but it seems to me that problems might arise if the database structure is too hidden from the people writing and maintaining an application.

        How do you maintain transactional integrity in a situation where Perl objects reference multiple tables and different Perl objects might have overlapping table references? It seems to me that sometimes, you want to make sure that a number of Objects being updated result in the database tables being updated atomically and not understanding the impact on the underlying tables might lead one to believe this is being done when it's not. Conversely, you might only want to operate on a single object and have that update the table without waiting on operations on associated objects.

        Forgive me if I'm speaking from ignorance of the tools.

      Spot on. +++++
      There is also the common development method where you don't bother with any SQL or relational DDL at all, i.e. you don't have a preexisting database you have to work with so you can just write your domain model in the ORM syntax and it will deploy and manage the DDL and SQL for the target platform.

      There are ORMs like DBIx::Class in Perl, Hibernate in Java, SQLAlchemy in Python, etc. where you write up class files which have entities, their attributes and relationships to the other entities, i.e. the domain object model and the ORM gives you the syntax to clearly and granularly define how you want your domain model to map to the relational database and it will create the relational database based on your class definitions, you don't have to worry about the relational side much at all. More importantly you don't have to worry about any SQL when using such objects in your business logic, there is syntax provided by the ORM to allow you to define what should happen when, e.g. in one transaction or even nest transactions and you just deal with objects, the ORM nicely does the back-end hassle for you.

      Now DBIx::Class is different in many ways to Hibernate and SQLAlchemy, it doesn't follow the full OOP way that the others do and has similar syntax to for example Ruby's ActiveRecord. In Hibernate and SQLAlchemy you define real OOP classes for each of you entities and they follow the standard OOP conventions of that language for defining attributes and methods that get entity relationships. Hibernate and SQLAlchemy give you annotations/decorations that you put on top of the class definition code to tell Hibernate and SQLAlchemy how to do the relational mapping. You typically never do any SQL at all you just work in OOP land in your business logic code and the backend ORM engine manages what needs to be done on the DBMS side.

      I've mentioned possibly having this type of ORM capability for the future version of DBIx::Class, for example one could forgo the current syntax used to define your object-relational mapping and do this instead: write normal Moose class files and have them decorated/annotated with DBIx::Class syntax to define the mapping and also any specific options if you want to change default or add optional ORM behaviors.

      Now I know there is some criticism of these OOP-centric ORMs, I'm not advocating them for everything just mentioning that they can be quite appropriate for certain projects and design strategies, especially if you want to always work in OOP and not worry about SQL much. DBIx::Class for example is in my mind much more flexible at doing very difficult database mappings and operations than the OOP-centric ones, but you then forgo working with Moose objects for you entities.

Re: Maybe database tables aren't such great "objects," after all ...
by moritz (Cardinal) on Mar 25, 2011 at 17:02 UTC

    I agree with your statement that database row objects are fine intermediate level objects, and often not too well suited for most really high level objects.

    That said I find it curious that you complain about not seeing where DB queries happen, and at the same time bemoan the missing abstractions. Abstractions are all about not caring what happens on the low level, so it sounds like you have conflicting goals.

    One more thing: you write about an object representing a table - that's not what the usual ORM does. Most really represent use objects for rows. DBIx::Class is a notable exception, it has both the concept of a "result class" (an object of which is a row), and a "result source (class)", an object of which can stand for a table, or intermediate results from DB queries (ie a set of rows).

    Having this distinction helps to remove some of the confusion around objects, tables and rows, but of course it doesn't help when you really want higher abstractions.

      That said I find it curious that you complain about not seeing where DB queries happen, and at the same time bemoan the missing abstractions. Abstractions are all about not caring what happens on the low level, so it sounds like you have conflicting goals.
      My take on this is that a relational database already has a lot of high-level abstractions going on, and the trouble with ORM mappers (or the way a lot of them are used) is that they dump these abstractions and actually take a step down the scale.

      (Think about it: which is a higher level language: perl or SQL?)

      I don’t mean to use the word, “complain.”   Rather, I meditate on what is the most-appropriate use of these classes, having observed a case which (in my mind) does not properly apply their strengths or reap their benefits in the best way.   The issue of “using outside SQL” (the necessity of it, the (dis)advantages of it, what to do when nothing else is as clear, etc. etc. etc.) is another koan entirely.   They truly are “meditations,” because there are no pat answers to be found.   “Everything is a trade-off.   How and what, then, shall you choose to trade?”   Indeed.

Re: Maybe database tables aren't such great "objects," after all ...
by Your Mother (Archbishop) on Mar 25, 2011 at 17:38 UTC

    Home grown? Well, there's your problem right there.

    This is a difficult problem space. Any naÔve attempt at it is going to be bad. What moritz said about DBIC is important: it has notions of the abstractions that are actually involved and not just that of the table.row and the DBI connection. It also makes them lazy and highly extensible.

    Your point #2 is an important design consideration and why many of us seem to have arrived on the notion of Fat models being the right design. This is quite easy to achieve in DBIC with different base classes, add ins, inflation, etc. Which makes unit testing really easy. E.g., theoretical and untested but close if not already correct–

    use My::Schema; my $schema = My::Schema->connection("dbi:SQLite::memory:"); $schema->deploy; $schema->populate( my_baseline_fixtures() ); # Set 'em all up if you have logic not loaded by default. $schema->load_namespaces( default_resultset_class => "+My::BizLogiks" +); # Or customize per class if necessary. for my $source ( $schema->sources ) { print $source, $/; $schema->resultset($source)->result_class("My::BizLogiks::" . $sou +rce); } # Do some tests!!!

    Update: fixed a typo; extended comment.

      I would enjoy finding references in the literature about what you mean by “Fat models.”   And, by extension, how you may “achieve them” within the DBIC (and DBICx) frameworks.   (These technologies are changing and evolving so fast...)

        Heh. I donít actually know if "fat model" is a term outside the Perl lists/sites I frequent. I donít have links right now but if some turn up, Iíll /msg you and update. I do have some demo/discussion for consideration.

        The idea is simple: business logic is consumed by everything that consumes the model (DB code) so it needs to be in the same place to avoid duplication. Itís an extension of the MVC debate which makes the Controllers *very* thin, doing as little as possible and containing no business logic outside of authorization style stuff, the View being quite dumb and not doing much either, and the Model being "fat."

        In a web app, the Controllers and Views might end up doing things with data; simplistic example: summing. In the fat Model this would be in the schema code.

        This example, being simplistic, will only involve information that is available to the DB/schema already. It could involve external info/models/services. This is where things can get really "fat" if needed. Youíd want the external stuff abstracted well so that you could mock it or run safe dev versions easily.

        Example - checkout_total for a cart full of items

        Cart has items, items have a cart, and items have a cost.

        Obviously the information to get a sum of the items in the cart is available and can be put together a couple different ways. But itís a multi-step process that is going to be needed over and over, so rather than putting it into DBIC syntax (which can be hairy or alterantively verbose for this kind of thing), or raw SQL, or letting an iterator do sums in a view, or some combination of all of those depending on the context, weíll put it in the model.

        Since itís an ORM, we already know what calling it should look like: $cart->checkout_total. There is no limit to how many of this kind of helper we can write and what it can do is only limited by judgement.

        Let your mind race ahead to write out checkout_total. What does it look like? Hereís the idiomatic implementation with DBICĖ

        sub checkout_total { +shift->items->get_column("cost")->sum; }

        Already, I hope, you can see that once you get your chops down, DBIC makes some easy but verbose things even easier and brief. Spelled outĖ

        # Inside My::Schema::Cart or an auxillary class only # loaded into the space on demand. sub checkout_total { # My::Schema::Cart object. my $self = shift; # items is a has_many relationship. my $items_rs = $self->items; # ->items_rs to avoid contex ambiguity +. # Get a DBIx::Class::ResultSetColumn object for cost. my $cost_col = $items_rs->get_column("cost"); # Perform SUM. my $sum = $cost_col->sum; return $sum; }

        After you get your legs, the first version becomes more legible than the second. And this is really a baby example. You can continue this sort of thing, riding the relationships, to achieve really powerful queries that remain readable as they become more complex. The business logic can become self-documenting and every piece of every query can be individually testable which will not be the case with SQL libraries or hand written DBI stuff.

        And now the full, working, demo codeĖ

        That example is fun and actually useful but a little weak because itís based on a Result. Resultset stuff is where it gets really cool. There is a really good walk-through of that here: dbix-masterclass.xul, though no one seems to support XUL at this time so you might just have to read the plain text source of the slides.

        The caveat with DBIC is that itís not easy. The learning curve is pretty steep. I think it amply repays the investment.

Re: Maybe database tables aren't such great "objects," after all ...
by ELISHEVA (Prior) on Mar 26, 2011 at 17:57 UTC

    The notion that we can just assume that object = table row or class = table is fundamentally flawed. At the simplest level, objects are about conceptual consolidation and classification. Tables are about deconstruction and normalization. These are very different things. In a well designed application, when there is a perfect one-to-one relationship between table architecture and object architecture it is almost always a special case.

    Frequently, as time progresses and the number of use case stories and business rules expand or change to meet new needs, that apparent one-to-one relationship breaks down. My guess is that your work with this aging system made it obvious because you could see this time effect in action.

    When we design databases we deliberately fracture our natural conceptions of "things" into a multitude of tables and rows. This fracturing is necessary so that each row of a table has certain mathematical properties. DBMS's rely on us translating our domain knowledge into an awareness of mathematical properties like functional dependence. So long as the rows of the database possess at least the properties of 3rd normal form, operations like inserts, deletes, validations, and joins are well behaved. They can be performed by a database engine that has absolutely 0 domain knowledge about the content of each row.

    Objects are the exact opposite. They are meant to encapsulate domain specific knowledge and reglue together the fractured object. If they relate to anything in a database, it would be a row in a database view, rather than a table. A database view is also meant to capture a consolidated view of the data that reflects the way we think about and work with data in our normal on-going activities.

    The difference between the two shows up both practically and theoretically. Take the classic "order form". To the database, an order form could be anywhere from 2 to hundreds of tuples spread across a minimum of two tables (order header, order line). Depending on the complexity of the data on the form, the data could be spread across many more tables (product tables, price tables, state tax tables, customer contracts, tables tracking special promotions, etc, etc). To the order taker, it is a single unit. The job of an application is to build a bridge between the user's view of an order - a single unit to review, print, approve, fulfill, and ship - and the database's view: a multitude of tuples with certain mathematical relationships to each other. How can an application translate between the two views if it insists on pretending they are synonymous?

    On a more theoretical level there are also problems with presuming a one-to-one relationship between table rows and class objects. Suppose we have two collections of tuples A and B. Tuple like A contains attributes (a,b,c) and use 'a' as the primary key. Tuples like B contain attributes (a,b,c,d,e,f). They also use 'a' as the primary key. Attributes d,e,f are not-applicable to tuples like A.

    From an object point of view we'd have two classes. Superclass A would store attributes (a,b,c). Subclass B would extend A and would add the attributes (d,e,f). Our application will have a mix of A and B objects since not all of our objects need (d,e,f).

    Now look at the database. Depending on how concerned you are about wasted space and 4th/5th normal form, you will either have one or two tables. In the one table model there will be a single table with columns (a,b,c,d,e,f). Tuples like A will just have null values in (d,e,f). If you don't want to waste space or want to enforce 4th/5th normal form, you will likely have two tables: A with columns (a,b,c) and B with columns (a,d,e,f).

    Now look at the object-table mapping. In neither case will you a simple one-to-one relationship between the rows in a table and the objects in a class. In the single table model you have one table and two classes. Some of the rows will be A object and some will be B objects.

    You are no better off in the double table model. All the rows in the B table are B objects. Superficially it looks like the object model and the database model match - two classes and two tables. However, the A table still contains a mix of A and B objects. You will still need two queries or database views: one to select A objects from the A table and one to select B objects. Furthermore, we can't just take a row from the B table and create a B object. To get all the data needed to build the B object you have to join the A and B tables.

    Now you could force a one-to-one relationship by defining an A table with attributes (a,b,c) and a B table with (a,b,c,d,e,f). However, now you've made it really hard to enforce key uniqueness. It can be done with triggers and custom programming but you can't use the built-in per-table uniqueness constraints because your list of already-used keys is split between two tables.

    Fat models are no solution. From what I can tell on the internet a fat model moves validation of data from the database to the application. I can't even begin to say what a dumb idea this is if there is any possibility of that database being used by more than one application. Suppose you have five different applications all using the same database. If the validation rules are in the database then the rules are defined and maintained in a single place. If you move the validation outside of the database, you now have to replicate the validations in each of the five applications. What happens if the validation rules change? What about security?

    If the rules are enforced in application code outside of the database, it is a lot easier to create a rogue application that intentionally corrupts or falsifies data. You could eliminate the need to change data validation rules in 5 different applications by insisting that all applications relate to the database via some sort of validation application. However, that is a much softer constraint than an in-database validation that refuses to save data and rolls back transactions when validations fail. Enforcing the rule requires a great deal of human code review effort. It can be done, but it is labor intensive. Rogue applications aren't going to use that layer and they are going to do their best not to get noticed and fed into the code review process.

Re: Maybe database tables aren't such great "objects," after all ...
by sundialsvc4 (Abbot) on Mar 28, 2011 at 01:05 UTC

    Responding to this topic in the way that it is naturally flowing ...

    What I like to do is to treat the database as a vehicle for persistence, nothing more or less.   Each program that works with the database is drawing from the database to populate its (application specific) “view of the world,”   When a change to that data has been prepared and confirmed, the application then “does whatever it needs to do” to the database(s) to permanently persist that change.   But there should be no assumption that physical database structures “correspond to reality.”   Businesses are always dreaming up new twists on the same old thing, but database structures are not easy to change and in any case must remain compatible with a lot of old code.

    And when it actually comes time to do the retrieval and the saving, I find that it really does work (just as) well to use SQL queries.

    I recently built a Moose application which is “ye olde shopping cart,” and in this application I made full use of Moose’s ability to easily specify that an object is “storable,” i.e. into the Session store of a web-server.   The shopping cart and all of its contents were thus simply pure-Perl objects; nothing more or less.   They persisted (only) in the Session store until the time came, if it ever did, to finalize the order.   The appropriate objects possessed the ability to populate themselves out of the database and to replace themselves into it, but only they knew (or cared) how it was done.   The strategy worked extremely well.   The design was extremely clean.

      If I may ask out of interest how accomplish you do this? Roll your own Moose-based ORM?
Re: Maybe database tables aren't such great "objects," after all ...
by cavac (Curate) on Mar 28, 2011 at 17:26 UTC

    When working with databases in general, in my mind the first question to ask is "how much data will i be working with".

    Depending on the answer, a high level approach may make coding easier and/or it might kill the performance of the application.

    Take your example of accounts (i assume money accounts, not the type you use for logging into a system). If you plan for a low number of entries, the high level approach will be benefitial. On the other hand, if you are coding for a bank, loading a gazillion of objects into your perl interpreter to do the daily sum may be.. suboptimal.

    When dealing with a high volume of data, let the database do it's job. In all likelyhood, a few hundred developers spent the last decade or two optimizing the heck out of that database. Also, modern databases have optimized server side scripting - no need to send thousands of rows back and forth between database and perl just to check if the account holder still has enough money left to pay his/her electricity bill.

    That said, having Perl classes that wrap all this SQL stuff into nice packages can be quite benefiting. But that also depends on how the various data subsystems interact - you may or not be forced to "break the rules" in certain cases to get the required performance. Or... your do some of the abstraction in whatever means you database gives you in terms of views, server side scripting and whatnot.

    So, always more than one way to do it there is, Luke. ;-)

      /me nods...

      You are definitely correct on all these points.   Thank you.   Many sides to The Force there are, young Skywalker.   He who tries to wedge all of it into one basket, survive will he not, no matter what or how large the basket might be.

Re: Maybe database tables aren't such great "objects," after all ...
by erix (Prior) on Mar 28, 2011 at 20:21 UTC

    Don't forget that many people (and I tentatively agree with them) do not like ORMs at all (let alone homegrown halfway-products), under any circumstance:

    ORMs are the Vietnam of Computer Science

    It seems kind-of logical to me that programmers like ORMs, and that DBAs dislike them (even while DBAs can be programmers, and programmers may administer databases).

    The problem is that when things are simple ORMs are nice but not necessary, and when things become complicated too often they cannot do what is needed.

      Nice article, but it does not need Vietnam.
Re: Maybe database tables aren't such great "objects," after all ...
by hermida (Scribe) on Apr 06, 2011 at 18:33 UTC
    I forgot to mention that with most modern ORMs you don't have to be stuck having to think of one class = one table, they give you a lot of flexibility in how you want to map your entity class diagram. For example, do you want abstract classes to have their own table? You can if you want or you can tell the ORM to make each class hierarchy a single table. Do you want multiple classes to map to one table, you can. You can even have your entire entity class diagram be a single database table.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlmeditation [id://895533]
Approved by moritz
Front-paged by moritz
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others lurking in the Monastery: (4)
As of 2021-10-21 05:59 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    My first memorable Perl project was:







    Results (82 votes). Check out past polls.

    Notices?