Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

SQL Diff

by pileofrogs (Priest)
on Jun 01, 2009 at 16:53 UTC ( [id://767319]=perlquestion: print w/replies, xml ) Need Help??

pileofrogs has asked for the wisdom of the Perl Monks concerning the following question:

Greetings ye Monks.

I want to put a database into version control. We have a devel and production web server with lots of stuff in a database. Small updates often go directly to the production server and larger updates go first to the devel server and later to the production server. Often the small changes only go to the production server and some testing data remains on the devel server, and the two servers generally diverge. If I could put the two databases into version control, I could merge them in a rational fashion from time to time.

The problem is, how do you diff and merge a database? This is tricky because normal text diff is based on changes in lines of text, which might work on a text dump of a database if you didn't also have to worry about ID collisions. An ID collision as I'm talking about it is where someone adds some data to one datbase and that data gets an automatically assigned ID key. Unless the next available ID is always the same on both databases, different bits of data will have different IDs, which breaks everything.

I think the answer would involve a diff-like utility that examines the differences between two databases and produces SQL that could make the appropriate changes to another database in a way that won't cause ID collisions. This brings me to the perl part of this question.

Does anyone have any recommendations for how to go about comparing two databases like this? I assume the heavy lifting can be done by existing perl modules. Which would people recommend? Any other ideas?

Thanks!
--Pileofrogs

Replies are listed 'Best First'.
Re: SQL Diff
by Corion (Patriarch) on Jun 01, 2009 at 16:58 UTC

    The easy way would be to create an ASCII dump of both databases (using, for example, mysqldump) and then use a traditional diff (or Algorithm::Diff) on those files. There is a module on CPAN that allows you to do the diff between two databases by issuing select *, count(*) from $table group by *, which is quite memory intensive, but might work as well for you. Also, there is SQL::Translator::Diff, which claims to create the proper ALTER TABLE statements for you.

Re: SQL Diff
by perrin (Chancellor) on Jun 01, 2009 at 17:09 UTC

    It is 100% unpossible to automatically diff schemas. How can a diff know that you changed the name of a column, vs removing a column and adding a new one? Or that you split out some columns into a separate table and the data has to be moved? You can get the easy stuff with SQL::Translator::Diff, but you will always have to manually check it and add things to it.

    On the other hand, if your goal is to get a copy of the real data for testing, you might be better off using replication or just restoring a dump of the production data on your dev server.

Re: SQL Diff
by jhourcle (Prior) on Jun 01, 2009 at 17:30 UTC

    Many databases have an option to generate replay logs. (so you can then catch up to when the last full backup was).

    I would assume you'd want to generate the logs, and then try to diff those, rather than the whole database.

    Of course, the problems with autonumbers won't come out on their own -- you'll have to know which tables have that issue, and then have something else follow up and look for the values that were assigned by inserts on those tables.

Re: SQL Diff
by dHarry (Abbot) on Jun 02, 2009 at 06:36 UTC

    Small updates often go directly to the production server

    Don't. Put the database under proper configuration management. And create a decent release procedure! How can you ever test something and be sure it will run in the production environment without surprises? You need a representative test environment, i.e. at least the db schema should be identical. Preferably a test environment separated from the development environment.

    ID collisions

    This is one of the "features" of meaningless attributes, i.e. generated ID's. But normally tables should have meaningful attributes too e.g. some column or collection of columns which should be unique. Focus on those to determine whether records are equal or not.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://767319]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chilling in the Monastery: (7)
As of 2024-04-25 15:00 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found