Beefy Boxes and Bandwidth Generously Provided by pair Networks Joe
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

OT: Ways of managing changes to a database's structure during a project.

by EvdB (Deacon)
on Feb 04, 2004 at 10:50 UTC ( #326458=perlmeditation: print w/ replies, xml ) Need Help??

I am currently working on a project which is heavily database dependant. As it is a work in progress there are plenty of changes being made all the time - both to the code and to the structure of the database (ie its tables and their columns, and triggers, and indicies etc).

This would usually not be a problem as to make a change to the database I would alter the sql that creates it, drop the database, recreate it, load up the test data and continue work. This is good and dandy, however...

The project is about to go live (web-based shop) and my problem is that on the live system data will need to be kept over the changes.

My problem is how to go about making these changes to the database structure whilst making it possible to apply them to a production system. I have come up with the following possible methods but would like some feed back on them before commiting to one.

  1. Transfer script The old database is kept and a new one is created. Data is then transferred from the old one to the new one using a perl script which also takes into account any structural changes.
  2. Make changes using SQL commands All changes to the database are made using 'ALTER table ADD column' commands so that all the data remains in place. Essentially a series of patches are applied.
  3. Store data and reload it All the data is stored to file (possibly using XML or YAML), the database is recreated and the data is then reloaded into the new database.

There are benefits and problems to each one. All of them would require keeping track of which version of the structure is currently being used and then acting accordingly.

As I see it this is a general database based project problem but I have not seen much discussion on it here or elsewhere (please correct if wrong). For the record I am using Postgres 7.4.

I am personally tending towards using sql commands which are stored in files named something like 'alter-2.3-2.4.sql' which would change the database from version 2.3 to 2.4.

Another consideration: How can all this be tested?

--tidiness is the memory loss of environmental mnemonics

Comment on OT: Ways of managing changes to a database's structure during a project.
Download Code
Re: OT: Ways of managing changes to a database's structure during a project.
by atcroft (Monsignor) on Feb 04, 2004 at 11:08 UTC

    Have you considered maybe using CVS-not just for the project files themselves, but also for the files containing the SQL for generating/altering the tables and any transfer/modification/utility scripts you may create as well? Just a thought...

      Fear not - all of it lives in CVS. Indeed I would tie in the upgrade/whatever scripts to move between various releases from the CVS.

      --tidiness is the memory loss of environmental mnemonics

Re: OT: Ways of managing changes to a database's structure during a project.
by gmax (Abbot) on Feb 04, 2004 at 11:16 UTC

    I would go for your first solution, transfering data from a database to another. And there are several reasons for that:

    • Having a new database created and filled with data, you can actually test your data with the existing code before switching.

    • With the other approaches, if an error occurs, it is more difficult to roll back to your initial stage.

    On a side note, even though I know that it may be late for you, I would say that changing data structures during development could be a symptom of a weak design. Perhaps spending more time at the design desk would save you all this ordeal.

     _  _ _  _  
    (_|| | |(_|><
     _|   
    
      I hear what you are saying about the testing - it is a very valid point.

      As for the getting it right at the design stage I agree, but it is not always possible or even desirable. I happen to have very few changes to make but a change is a change and requires respect - hence the OP.

      Changes can occur for lots of reasons, usually they are used to add features. For example say I am ( and I really am ) designing a small simple web shop. After a few months of trading it turns out that discounts would be a good thing for customers spending over a certain amount. To accomodate this the database (and code) needs to be changed. This sort of thing cannot always be spotted at the initial design stage and it would be impossible to take into account all the possible future features that are needed.

      In summary: Yes, getting it right at the start is a good thing, but so is making changes based on lessons learnt.

      --tidiness is the memory loss of environmental mnemonics

        After a few months of trading it turns out that discounts would be a good thing for customers spending over a certain amount. To accomodate this the database (and code) needs to be changed.

        Actually, I would think that for adding such functionality you would only need to add to a DB, not actually modify the existing structure. (I dont consider table additions to be a modification.) Perhaps your tables weren't sufficiently normalized? ;-) (Code changes would for sure be required.)

        Anyway, snarkyness aside, my approach with stuff like this tends to be to write a perl script that does the DB modification via SQL. I test it on copies of the production DB, and then once happy and the code is approved for production run it on the production DB. I like to use the perl script to run the SQL because I find that its usually the easiest way to ensure newly added fields have the appropriate values assigned to them. For instance, ill use alter table to add a field as nullable, walk the table updating each record so it has the correct value, then alter table again to set it not-nullable, and perhaps add an index on the new columns.

        Anyway, good luck.


        ---
        demerphq

          First they ignore you, then they laugh at you, then they fight you, then you win.
          -- Gandhi


      I've been developing a web site with a friend for about the past 8 months. We've gone through a few major schema changes. This has been the approach we've followed when moving to a new schema and it has worked very well.

      When we're ready to go live on a new schema, I'll compare the old schema to the new one and see where the tables differ. From that, I generate SQL statements to copy the data from the old schema to the new schema (I use quite a bit of perl code through all of this). We're using SQL Server so we can copy data directly from one schema to another with SQL statements. If I couldn't do it this way, I'd probably use perl to manage the copying.

      A few days before we go live, we'll create a new database, load the new schema, test the conversion scripts and do testing on the new site. If all is well, we'll create another new database and do it all again to go live.

      Most of our schema changes are due to new features being added, not design problems (yeah, that's what they all say :-).

      At my last job, we tried using 'ALTER TABLE' commands to maintain a database. This didn't work too well. Sometimes the changes were too involved to use 'ALTER TABLE' and tables would have to get recreated. This presented problems with tracking foreign key constraints. Besides, if you get something wrong in this process, you only have a database dump to fall back on (you do backup your database before changing the schema, right?). With the method described above, all of the old data is still intact in the old schema.

      90% of every Perl application is already written.
      dragonchild
Re: OT: Ways of managing changes to a database's structure during a project.
by borisz (Canon) on Feb 04, 2004 at 11:24 UTC
    I have a webapplication server that worked this way and it was a good solution to me. I used .sql scripts and alter the tables on the fly. Webparts changed with Apache::Reload. If you prefer the use of fetchrow_hashref your application did not notice most changes in the database anyway. That saved most time for me.
    Boris
Re: OT: Ways of managing changes to a database's structure during a project.
by ysth (Canon) on Feb 04, 2004 at 11:40 UTC
    Don't forget to mandate a backup of the database before messing with it.
Re: OT: Ways of managing changes to a database's structure during a project.
by Abigail-II (Bishop) on Feb 04, 2004 at 11:41 UTC
    I'll keep it brief, because it has nothing at all to do with Perl. For three years, I worked at a company whose product heavily relied on a database as well, and the database structure was being modified on a daily bases. (Often several times a day). Tables would be added, columns added, indices created or removed, stored procedures, triggers, views, classifications of tables (which would determine whether a table was being replicated, and whether is was replicated one-way or two-way), data removed, added, altered. You name it, it happened. (The company made a large portion of its money by implementing customer requests - hence the constant modification of the database).

    We used option 2. Options 1 and 3 just take too long, and require the database to be unavailable while modifying it. Now, this is ok if apply patches to your production environment every three months - you do that in a sceduled window anyway. But in a testing environment, you can't make your testing databases unavailable for half an hour three times a day - and have 30 or 40 people unable to do their job. You want to be able to make updates on the fly.

    The boundary between development and staging was a bit different. The same database modification scripts were used, and there where two staging databases, db1 and db2, both identical, and with minimal data. If a developer wanted to 'release' one or more scripts, the scripts were first applied to db1. If no errors or warnings occurred, the scripts were applied to db2 as well. However, if there was a problem applying the scripts to db1, the scripts were rejected, and a copy of db2 was loaded into db1.

    Furthermore, before scripts could be taking out of staging, and into testing, I had to sign them off. Without my approval, they wouldn't make it into testing (let alone be released).

    Anyway, that was far more than I wanted to write.

    Abigail

Re: OT: Ways of managing changes to a database's structure during a project.
by rinceWind (Monsignor) on Feb 04, 2004 at 13:37 UTC
    You might want to take a look at Alzabo. This seems a good fit with what you are trying to do.

    --
    I'm Not Just Another Perl Hacker

      Thanks for the plug ;)

      If you're using Alzabo to manage your schema structure then using it as part of an install/update process is trivial. Here's some sample code from a Module::Build subclass I wrote:

      sub ACTION_update_db { $_[0]->_update_db; } sub _update_db { my $self = shift; $self->depends_on('install'); require Alzabo::Create::Schema; my $schema_svn_dir = File::Spec->catdir ( $self->base_dir, 'alzabo-schema-files', 'NameOfSchema' ); my $to_dir = File::Spec->catdir ( Alzabo::Config::schema_dir(), 'NameOfSchema' ); my $copied; foreach my $file ( glob File::Spec->catfile ( $schema_svn_dir, 'NameOfSchema.*' ) ) { my $file = $self->copy_if_modified( from => $file, to_dir => $to_dir, flatten => 1, ); $copied = $file if defined $file; } return unless $copied; require Alzabo::Create::Schema; my $s = Alzabo::Create::Schema->load_from_file ( name => 'NameOfSchema' ); my %connect = ( user => 'username', password => 'password', ); if ( grep { /NameOfSchema/ } $s->driver->schemas(%connect) ) { # This makes the live schema in the database look # like the schema stored in the Alzabo data files. $s->sync_backend(%connect); } else { $s->set_instantiated(0); $s->create(%connect); } }

      This code can be run as its own action (./Build update_db), and it's also run when the install action is run. It's pretty darn convenient.

Re: OT: Ways of managing changes to a database's structure during a project.
by hv (Parson) on Feb 04, 2004 at 14:17 UTC

    This was a problem that I tried to solve for my work project, so let me give you an idea of the approach I used and how it has fared over the last few years. Note that we don't suffer the same rate of change in the database structure that you describe - after slightly over 3 years we have reached version 111 (decimal) of the database schema.

    The application has a single abstract class NVC::DB which mediates all database access and supplies core functionality for its concrete children, each mapping complete individual records from a particular table to instances. Each concrete child supplies initially the table definition, used to provide polymorphic accessor methods for each field and support selects and modifications on the table, and additionally has explicit methods for various aspects of the default behaviour specific to these objects.

    When part of the table definition needs to change, both old and new aspects of the definition are retained, along with instructions for any additional population measures required to effect the change. These are used to allow us automatically to upgrade older client sites to a specified version of the database schema when required to supply new functionality or fix bugs.

    Here's a simplified example:

    package NVC::DB::FileType; use strict; use NVC::DB; use vars qw/ @ISA /; BEGIN { @ISA = qw/ NVC::DB / } BEGIN { __PACKAGE__->register(<<TABLE); table filetype key: id int(11) not null auto_increment unique: name char(32) binary not null mimetype varchar(255) binary not null [version 99 add: iconpath varchar(255) binary using: desc: add filepath for optional icon table: add iconpath varchar(255) binary pop: update filetype set filepath = "image/html.gif" + where mimetype = "text/html" ] [version 1 add: using: desc: populate default filetypes pop: insert into filetype values + (1, 'plain text', 'text/plain'), + (2, 'HTML text', 'text/html'), + (3, 'other', 'application/octet-stream') ] TABLE }

    This versioning process was designed to achieve four primary goals: to allow automated database upgrades for individual sites to a given database schema version; to give a clear presentation in the source code of both the history of the table definition and its current state; to maximise code reuse by allowing the automatic generation of class-specific methods providing standard functionality; and to maximise the potential for applying bugfixes by installing the latest code, without being required to do the potentially disruptive database upgrade required for the latest new features.

    The automated database upgrade has succeeded very well - new functionality is normally developed separately, so that the changes associated with any new feature are usually encapsulated in a single database version change. The upgrade script is definitely the hairiest piece of code in the application as a whole, but it has been very stable and rarely needs changes.

    The clear presentation in the source code has been, for the core tables, a failure - the rate of change in the definitions of these tables has been much greater than anticipated, so that the table definitions in the unfiltered source are making it almost impossible to find the current definition in a mess of nested changes. In practice the problem is somewhat lessened by the fact that the developer is most familiar with these classes, and in the calmer waters of less actively developed classes (where it is particularly needed) the presentation is usually clearer and more helpful.

    This problem was much reduced when we reached the point that all client sites were at least at v86, since we took the opportunity to remove most of the existing historical information from the latest sources (relying instead on the CVS repository to access that information if we ever again need it). However it is certain that this will become a problem again as development continues.

    The approach of maximising code re-use has, I think, been very successful: having a single clear consistent interface to all these objects also helps to maximise brain re-use.

    We haven't fully realised the intent of installing latest code with bugfixes without the database upgrade - it has proved much less practical than anticipated to ensure that changes for new features were isolated so as not to get executed on any code path for sites that did not enable the feature, and several new features were designed from the start to become endemic to the codebase. Despite that, it has at least proved a mostly practical system for use in non-public sites for code development, template and graphic design, and testing. Also, on the whole the NVC sites have shown very good stability in the relevant areas: few bugs in the code, prompt installation of bugfixes, few problems introduced by database upgrades and few problems caused by database/code version incompatibilities.

    Hope this helps,

    Hugo

Re: OT: Ways of managing changes to a database's structure during a project.
by dws (Chancellor) on Feb 04, 2004 at 16:52 UTC
    My problem is how to go about making these changes to the database structure whilst making it possible to apply them to a production system.

    This is a problem a number of other folks have smacked their heads on. You can find a number of good pointers by Googling for "refactoring databases" (or related keywords).

    Scott Ambler has written some good stuff on Refactoring Databases. Follow that link for some more leads. Most of it won't be on point for what you're looking for, but I bet you'll find plenty of useful information.

Re: OT: Ways of managing changes to a database's structure during a project.
by tilly (Archbishop) on Feb 05, 2004 at 00:43 UTC
    You can always scream loudly when database changes are requested and when it doesn't work complain bitterly that it is too hard to keep changing the database arbitrarily.

    I don't necessarily recommend this, but it certainly seems to be a popular approach...

    (However given the troubles of synchronizing revision control releases and databases, I'd certainly suggest that database changes should be viewed as fairly costly.)

Re: OT: Ways of managing changes to a database's structure during a project.
by podian (Scribe) on Feb 05, 2004 at 21:25 UTC
    As I see it this is a general database based project problem but I have not seen much discussion on it here or elsewhere (please correct if wrong). For the record I am using Postgres 7.4.

    It is also an standard upgrade problem - What to do when you go from one version to the next version which changes the datbase structure.

    You have more chanllenges when doing it on a live system where the database can't go down (or may be can go down only for few minutes).

    I agree that each change must be versioned so that, for example you can go from version 1 to 3 (skipping version 2).

    One interesting problem to consider is:

    Say a customer has version 1.

    you made a table change (new column X) in version 2

    You made another table change in version 3 (new column Y

    Now your version 3 SQL will have only 'add column Y'

    To upgrade from version 1 to 3, you need to combine both changes or upgrade to version 2 first and then upgrade to version 3

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlmeditation [id://326458]
Approved by Corion
Front-paged by broquaint
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others imbibing at the Monastery: (6)
As of 2014-04-18 23:49 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    April first is:







    Results (473 votes), past polls