http://www.perlmonks.org?node_id=326491


in reply to OT: Ways of managing changes to a database's structure during a project.

This was a problem that I tried to solve for my work project, so let me give you an idea of the approach I used and how it has fared over the last few years. Note that we don't suffer the same rate of change in the database structure that you describe - after slightly over 3 years we have reached version 111 (decimal) of the database schema.

The application has a single abstract class NVC::DB which mediates all database access and supplies core functionality for its concrete children, each mapping complete individual records from a particular table to instances. Each concrete child supplies initially the table definition, used to provide polymorphic accessor methods for each field and support selects and modifications on the table, and additionally has explicit methods for various aspects of the default behaviour specific to these objects.

When part of the table definition needs to change, both old and new aspects of the definition are retained, along with instructions for any additional population measures required to effect the change. These are used to allow us automatically to upgrade older client sites to a specified version of the database schema when required to supply new functionality or fix bugs.

Here's a simplified example:

package NVC::DB::FileType; use strict; use NVC::DB; use vars qw/ @ISA /; BEGIN { @ISA = qw/ NVC::DB / } BEGIN { __PACKAGE__->register(<<TABLE); table filetype key: id int(11) not null auto_increment unique: name char(32) binary not null mimetype varchar(255) binary not null [version 99 add: iconpath varchar(255) binary using: desc: add filepath for optional icon table: add iconpath varchar(255) binary pop: update filetype set filepath = "image/html.gif" + where mimetype = "text/html" ] [version 1 add: using: desc: populate default filetypes pop: insert into filetype values + (1, 'plain text', 'text/plain'), + (2, 'HTML text', 'text/html'), + (3, 'other', 'application/octet-stream') ] TABLE }

This versioning process was designed to achieve four primary goals: to allow automated database upgrades for individual sites to a given database schema version; to give a clear presentation in the source code of both the history of the table definition and its current state; to maximise code reuse by allowing the automatic generation of class-specific methods providing standard functionality; and to maximise the potential for applying bugfixes by installing the latest code, without being required to do the potentially disruptive database upgrade required for the latest new features.

The automated database upgrade has succeeded very well - new functionality is normally developed separately, so that the changes associated with any new feature are usually encapsulated in a single database version change. The upgrade script is definitely the hairiest piece of code in the application as a whole, but it has been very stable and rarely needs changes.

The clear presentation in the source code has been, for the core tables, a failure - the rate of change in the definitions of these tables has been much greater than anticipated, so that the table definitions in the unfiltered source are making it almost impossible to find the current definition in a mess of nested changes. In practice the problem is somewhat lessened by the fact that the developer is most familiar with these classes, and in the calmer waters of less actively developed classes (where it is particularly needed) the presentation is usually clearer and more helpful.

This problem was much reduced when we reached the point that all client sites were at least at v86, since we took the opportunity to remove most of the existing historical information from the latest sources (relying instead on the CVS repository to access that information if we ever again need it). However it is certain that this will become a problem again as development continues.

The approach of maximising code re-use has, I think, been very successful: having a single clear consistent interface to all these objects also helps to maximise brain re-use.

We haven't fully realised the intent of installing latest code with bugfixes without the database upgrade - it has proved much less practical than anticipated to ensure that changes for new features were isolated so as not to get executed on any code path for sites that did not enable the feature, and several new features were designed from the start to become endemic to the codebase. Despite that, it has at least proved a mostly practical system for use in non-public sites for code development, template and graphic design, and testing. Also, on the whole the NVC sites have shown very good stability in the relevant areas: few bugs in the code, prompt installation of bugfixes, few problems introduced by database upgrades and few problems caused by database/code version incompatibilities.

Hope this helps,

Hugo