SQL Diff

pileofrogs has asked for the wisdom of the Perl Monks concerning the following question:

Greetings ye Monks.

I want to put a database into version control. We have a devel and production web server with lots of stuff in a database. Small updates often go directly to the production server and larger updates go first to the devel server and later to the production server. Often the small changes only go to the production server and some testing data remains on the devel server, and the two servers generally diverge. If I could put the two databases into version control, I could merge them in a rational fashion from time to time.

The problem is, how do you diff and merge a database? This is tricky because normal text diff is based on changes in lines of text, which might work on a text dump of a database if you didn't also have to worry about ID collisions. An ID collision as I'm talking about it is where someone adds some data to one datbase and that data gets an automatically assigned ID key. Unless the next available ID is always the same on both databases, different bits of data will have different IDs, which breaks everything.

I think the answer would involve a diff-like utility that examines the differences between two databases and produces SQL that could make the appropriate changes to another database in a way that won't cause ID collisions. This brings me to the perl part of this question.

Does anyone have any recommendations for how to go about comparing two databases like this? I assume the heavy lifting can be done by existing perl modules. Which would people recommend? Any other ideas?

Thanks!
--Pileofrogs

Comment on SQL Diff

Replies are listed 'Best First'.
Re: SQL Diff by Corion (Patriarch) on Jun 01, 2009 at 16:58 UTC
The easy way would be to create an ASCII dump of both databases (using, for example, `mysqldump`) and then use a traditional `diff` (or Algorithm::Diff) on those files. There is a module on CPAN that allows you to do the diff between two databases by issuing `select , count() from $table group by *`, which is quite memory intensive, but might work as well for you. Also, there is SQL::Translator::Diff, which claims to create the proper `ALTER TABLE` statements for you.	[reply] [d/l] [select]
Re: SQL Diff by perrin (Chancellor) on Jun 01, 2009 at 17:09 UTC
It is 100% unpossible to automatically diff schemas. How can a diff know that you changed the name of a column, vs removing a column and adding a new one? Or that you split out some columns into a separate table and the data has to be moved? You can get the easy stuff with SQL::Translator::Diff, but you will always have to manually check it and add things to it. On the other hand, if your goal is to get a copy of the real data for testing, you might be better off using replication or just restoring a dump of the production data on your dev server.	[reply]
Re: SQL Diff by jhourcle (Prior) on Jun 01, 2009 at 17:30 UTC
Many databases have an option to generate replay logs. (so you can then catch up to when the last full backup was). I would assume you'd want to generate the logs, and then try to diff those, rather than the whole database. Of course, the problems with autonumbers won't come out on their own -- you'll have to know which tables have that issue, and then have something else follow up and look for the values that were assigned by inserts on those tables.	[reply]
Re: SQL Diff by dHarry (Abbot) on Jun 02, 2009 at 06:36 UTC
Small updates often go directly to the production server Don't. Put the database under proper configuration management. And create a decent release procedure! How can you ever test something and be sure it will run in the production environment without surprises? You need a representative test environment, i.e. at least the db schema should be identical. Preferably a test environment separated from the development environment. ID collisions This is one of the "features" of meaningless attributes, i.e. generated ID's. But normally tables should have meaningful attributes too e.g. some column or collection of columns which should be unique. Focus on those to determine whether records are equal or not.	[reply]


Perl: the Markov chain saw
	PerlMonks