http://www.perlmonks.org?node_id=246032

By 'No Fear', I am suggesting that with sufficient decoupling of your application from the database, you can feel free to make changes to your application and to your database without being trapped by the terror of "what if we break the application that our entire company depends on?" Fear-based development is common, but clearly we recognize that what we really want is goal-based development. If we don't reach for goals because fear is holding us back, we've limited our company's opportunities.

Next month, I will be giving a talk at Portland Perl Mongers on this topic. A rough description of my talk is as follows:

=head1 NAME use strict 'sql'; =head1 SYNOPSIS my $sql = 'SELECT * FROM that' and die "Don't do that!"; my $data = $sth->fetchrow_hashref and die "Or that!"; my $sql = 'SELECT this FROM that' and die "Still bad"; use base 'Class::DBI'; # much better =head1 OVERVIEW Many people misuse SQL. While some of the above can be fine for a sho +rt script, we should be careful about how SQL is used in a production env +ironment. C<use strict 'sql';> will detail why the above constructs can lead to non-scalable code. First, we'll show some examples of bad SQL and then move on to better +SQL with bad implementations (hint: I don't avoid C<&DBI::fetchrow_hashref> fo +r performance reasons). We'll finish with a quick discussion of how obj +ect persistence modules can help lead us lead us out of the quagmire. =head1 NOTE I'll primarily focus on C<Class::DBI> for the object persistence, but +I'll touch on some of the other OO Persistence modules such as Tangram and +Alzabo. I'll also discuss what I think is a bit of a design issue with C<Class +::DBI> (sorry Schwern!) and, by the end of the talk, you'll understand why I +feel that way. Side note: despite the design issue I alluded to, C<Class::DBI> rocks +! Joe Bob says "check it out!" =cut

In short, the thesis is that with proper abstraction of SQL, coupled with adequate tests (you didn't think I'd skip that, did you?), you can build more robust applications and have less worry about database changes, even dramatic ones. Anecdotes welcome! This post is essentially a request for your ideas, insights, experiences, etc, regarding this topic.

Cheers,
Ovid

New address of my CGI Course.
Silence is Evil (feel free to copy and distribute widely - note copyright text)

Replies are listed 'Best First'.
Re: No Fear: decoupling your application from the database
by perrin (Chancellor) on Mar 26, 2003 at 18:40 UTC
    I'm giving a talk on this subject at OSCON. Here are a few random notes:

    • You don't need Class::DBI to get abstraction. Hand-coding these classes is a good idea in certain situations (mostly when you have a complex database).
    • SPOPS seems to be a bit more extensible than Class::DBI.
    • None of these give you as much ability to change as XML::Comma. It lets you redefine data structures on the fly and adjusts the database tables for you as necessary.

    I'd definitely be interested in hearing more of your ideas about this. I'm not following the bit about fetchrow_hashref.

      Thanks for the input. The problem I have with &DBI::fetchrow_hashref is the same problem we have with "SELECT * FROM that". Essentially, what we're doing is trying to fetch both the data and the metadata. For a robust application, we should fetch the data based upon our foreknowledge of the metadata. For instance, consider the following:

      while (my $data = $sth->fetchrow_hashref) { my $total = $data->{subtotal} * $tax_rate; # do stuff }

      Now imagine that the above code generates the following warning:

      Use of uninitialized value in multiplication (*) at foo.pl line 30.

      We have no way of knowing, from that error message, whether or not the problem is an undefined value in the hash or a non-existent hash key (assuming that we know the $tax_rate has been defined). Problems like this are more prevalent when we're essentially fetching two types of data. Also, many argue that we shouldn't be doing direct data access like that. If we were to retrieve the data through a clean API rather than by reaching directly into the hash, we can tweak the internals to ensure that we're returning expected data and handling errors gracefully.

      In glancing at SPOPS, I can see that it looks very interesting and has some features I've wanted in Class::DBI. I'll dig into it more. Given my lack of time, I'll probably maintain my focus on Class::DBI, but include SPOPS and perhaps update my presentation in the future. Also, quite frequently at our Perl Mongers meetings, people start asking gnarly questions about how a particular technology works. I suspect that I can answer that much better about Class::DBI than SPOPS :)

      Also, I completely agree that Class::DBI is not necessary for an abstraction. I'll present other models including a hand-rolled PersistentObject module that I wrote prior to discovering POOP. I've also dug up an example of this in Python that is perfect as an example of how not to do this. It's not that it's written in Python, but that it's intrisically tied to the column names in the database.

      Cheers,
      Ovid

      New address of my CGI Course.
      Silence is Evil (feel free to copy and distribute widely - note copyright text)

        Hmmm. I'd say fetchrow_hashref is not as bad as "select *" in most cases, since when you use "select *" you could end up getting a whole bunch of extra rows you don't want, or get the columns in a different order. Using the name of a column in fetchrow_hashref is essentially the same as using the name of the column in a SQL statement. (Of course I don't use it because of the performance hit...)
      XML::Comma definitely looks extremely interesting, especially as a replacement for CGI::Application once you scale. I especially like the idea of reducing the need for the database, instead depending on the filesystem.

      Unfortunately, I suspect that it would have issues in non-document-based applications, such as financial reporting systems. *ponders*

      ------
      We are the carpenters and bricklayers of the Information Age.

      Don't go borrowing trouble. For programmers, this means Worry only about what you need to implement.

      Please remember that I'm crufty and crochety. All opinions are purely mine and all code is untested, unless otherwise specified.

Re: No Fear: decoupling your application from the database
by dws (Chancellor) on Mar 26, 2003 at 19:35 UTC
    In short, the thesis is that with proper abstraction of SQL, ...

    I'd like to see an example of how to generate a non-trivial query (say, one involving an OUTER JOIN) once you've abstracted away the SQL.

    It's been my experience that simple mapping scheme work great as long as you've got a 1-to-1 mapping between classes and tables, but start to fall apart when the queries get complex. That's a major sticking point for a lot of people. If you can address that in your presentation, you might gain some more converts.

      The point of my presentation will not be to generate all SQL on the fly (though it's often good to do as much of this as possible), but to decouple the application from the the database. For example, imagine that you want to fetch a list of all customers along with their order numbers, if they've ever placed an order. How might this be implemented?

      sub order_numbers { # assume, for the sake of argument, that order numbers # are optional my $class = shift; my $sql = <<' END_SQL'; SELECT c.last_name, c.first_name. o.order_num FROM customers c LEFT OUTER JOIN orders o ON c.customer_id = o.cu_id END_SQL return $class->_fetch_records($sql); }

      That looks reasonable enough, but what happens if you (for example) want to rename orders.cu_id? Every place in your code that this is embedded will break. Here's a different approach.

      sub order_numbers { my $class = shift; my $customers = $class->_table; my $orders = Order->_table; my %cust = map { $_ => $class->_column($_) } qw{ first_name last_name customer_id }; my %order = map { $_ => Order->_column($_) } qw{ order_num cust_id customer_id }; my $sql = <<" END_SQL"; SELECT $customers.$cust{last_name}, $customers.$cust{first_name} $orders.$order{order_num} FROM $customers LEFT OUTER JOIN $orders ON $customers.$cust{customer_id} = $orders.$order{customer_id} END_SQL return $class->_fetch_records($sql); }

      Now, you can change your column and table names at will, but you don't break your code so long as you update your field map in the object that handles the appropriate table. Admittedly, this is much more work, but it's more robust and the bulk of the SQL can still be auto-generated. In my experience, the amount of work you save by abstracting out the rest of your SQL easily offsets messes like the above snippet.

      Naturally, since I expect to see a full test suite with all of that, I'll still catch the errors, if any, in said suite.

      Cheers,
      Ovid

      New address of my CGI Course.
      Silence is Evil (feel free to copy and distribute widely - note copyright text)

        I'm not convinced that this extra abstraction is worth the trouble. Readability definitely suffers, and the ability to hand SQL to a DBA for help is hampered. (You have to evaluate the code and then dump the SQL out first.) More importantly, any non-trivial change to your data structures will still break everything. This only protects from changes in names.

      I've used the join facility in Alzabo for only trivial things but so far it handles things ok.

      I've just altered the node and added an example from my own code where I use Alzabo's join facility. You'll see it right inside the while() loop.

      sub display_address { my $self = shift; my $tmpl = $self->load_tmpl( 'display_address.tmpl' ); my $schema = $self->param( 'schema' ); my $q = $self->query; # Retrieve the parameters my $street = $q->param( 'street' ); my $house = $q->param( 'house' ); my @people; { my $roster = $schema->Roster; my $history = $schema->History; my $roster_fk = $history->RosterFk; my $election = $schema->Election; my $election_date = $election->Date; my $election_desc = $election->Desc; my $people = $roster->rows_where( where => [ [ $roster->StreetFk, '=', $street ], [ $roster->HouseNumber, '=', $house ] ], order_by => [ $roster->Name ] ); # Retrieve the result set into an array while (my $person = $people->next) { my @votes = $schema->join( join => [ $history, $election ], select => [ $election ], where => [ $roster_fk, '=', $person->SosId ], order_by => [ $election_date, $election_desc ], )->all_rows; # Fetch the values from the selected data @votes = map $_->Date.': '.$_->Desc, @votes; push @people, { History => join("<br />",@votes), Name => $person->Name }; } } $tmpl->param('people', \ @people); return $tmpl->output; }

        Alzabo also does outer joins pretty simply. At least, it's simple in terms of scaling up from the complexity of a non-outer join example. There's still that initial learning curve and all ;)

        my $cursor = $schema->join ( select => [ $table1, $table2 ], join => [ [ left_outer_join => $table1 => $table2 ], [ $table2 => $table3 ], ], ... );

        The current CVS version also adds support for specifying join conditions as in this SQL:

        SELECT ... FROM Foo OUTER JOIN Bar on Bar.foo_id = Foo.foo_id AND Bar.size > 2 WHERE ...

        The Alzabo syntax for it is a bit ugly so far though.

Re: No Fear: decoupling your application from the database
by mpeppler (Vicar) on Mar 26, 2003 at 21:46 UTC
    I agree wholeheartedly with the objective. Personally I tend to use stored procedures for all SQL interaction, and table driven database interaction with these stored procedures.

    As long as the interface between the different parts stays constant underlying pieces can change without affecting the entire project (and in particular queries can be tuned without affecting application i.e. perl code).

    The use of stored procedures is of course a portability problem, but the advantages afforded to the DBA are immense (and right now I wear a DBA hat, so... :-)

    Michael

Decoupling not just halfway but all the way
by htoug (Deacon) on Mar 27, 2003 at 08:43 UTC
    We have moved away from having any close contact with the database in the code.

    Database access is handled by a separate, carefully handcoded, layer that handles the business logic and transaction handling. Access to that layer is by well-documented access metohds, that represent the problem space, not the database implementation. The programmer is thinking in terms of eg invoices, so define a method that hands back an invoice given its key and a method that updates/inserts an invoice, both representing the invoice in a way that is natural for the application. The complexities of storing the invoice in its interrelated set of 2, 3, or more tables is hidden. An application of the well-proven divide and conquer approach!
    This gives us the ability to change the database structure without changing the myriad of code accessing it, and relieves the programmers of having to deal with the intricacies of coding against the database system. We find that often a query involves more than one table, and in surprisingly many cases it cannot be represented by a single SQL statement, but require several.
    All this naturally leads you to implement the database access layer as a separate transaction server, which we have found is a nice productivity boost.

    YMMV of course, but having coded in this way in the last approx. 7 years, after having done it in all kinds of ways since I got in touch with my first RDBMS in the late '80ies, I can only reccomend that you give it a try.
    Of course SQL-generators can help the data access layer a bit, but in my experience not as much as to regain the effort to write them.

Re: No Fear: decoupling your application from the database
by pg (Canon) on Mar 27, 2003 at 06:32 UTC
    For sure this would greatly help the effort to shorten development cycle, but I have performance concern against the idea to generate SQL statement automatically. (This is more a general concern, not directly against your idea, as I have read all of your replies you posted so far, and know you are not trying to generate all SQL statement on fly.)

    My database experience is mainly with ORACLE and reasonablly large database (some of the tables go up to 3 million record, for example, we have a table to hold sales history for 3 months). My experience tells me that, most of the time, your host language would not cause serious performance problems, even if you are not a wizard of your host language. Performance problem is almost always caused by database, and one major source is your SQL statement.

    A slight but thoughtful adjustment of your join, or your where clause might make your SQL statement 100 times faster, or even more. And I simply think it would be extreamly difficult to generate SQL statements that have good performance consistantly. And it is surely more difficult to make the performance to be consistant across different databases.

    Another source of difficulty is the SQL syntax differences, although the basic is always the same every where. Some of those extensions are for performance.

      My experience tells me that guessing where a performance problem might be is almost always wrong.

      The easier it is to write and to modify code, the more time I'll have to run a profiler, find the real bottlenecks, and optimize what really matters.

Re: No Fear: decoupling your application from the database
by Aristotle (Chancellor) on Mar 31, 2003 at 02:33 UTC