Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much

Advice on syncing heterogeneous database tables

by tcf03 (Deacon)
on Feb 05, 2007 at 20:57 UTC ( #598410=perlquestion: print w/replies, xml ) Need Help??

tcf03 has asked for the wisdom of the Perl Monks concerning the following question:

A while back I was tasked with syncing some tables on an informix db with a sql server db - basically it was a re-write of someone elses code, which was pretty nasty. I didn't have a lot of time to do this -so I wrote it thinking that I could come back and rewrite the rewrite. I am now in a situation where I need to make data go both ways and in some instances I need to drop entire tables and reload them. So I am modifying the same code to do more and more things and the code is not always translating well for what I need it for.

So - now im going to rewrite the rewrite, but Id like to ask some advice from some who I am sure have done this very thing more than I.

Here is the original code that I wrote
#!/usr/bin/perl use strict; use warnings; use DBI; use Carp; ## For quick sanity checks. ## use diagnostics; # use Data::Dumper; ## The following BEGIN is necessary to insure proper ## database connectivity. It sets up ALL necessary ## environmental and Perlish goo. ## When debugging this code realize that ## the following vars: ## @data_map $rkey $lkey $rtable $ltable ## are all defined in the module being called on ## the command line. BEGIN { push @INC, '/usr/local/yyy/CDR/lib'; push @INC, '/usr/local/perl/lib'; eval "use $ARGV[0]"; die $@ if $@; } # Use YYY specific DB connections # See the source under /usr/local/perl/lib # for a better explanation use YYY::yyyDB; ## Modify $0 so it shows up ## in process list as something ## other than perl $0 =~ s!.*/!!; # code refs simplify function # usage my $fetch = \&fetch; # Hold the errors for reporting... my @ERRS; print "starting Log\n"; ## start the log my $LOG_DIR = '/usr/local/yyy/CDR/Logs'; my ( $sec, $min, $hour, $mday, $mon, $year, $wday, $yday ) = localtime(); $year += 1900; my $LOG = sprintf "$ARGV[0]-%4i%02i%02i%04i", $year,$mon,$mday, $$; my $LF; # Log File open $LF, '>', "$LOG_DIR/$LOG" or confess "unable to open $LOG_DIR/$LOG: $!\n"; print "starting Exceptions Log\n"; # Start an exceptions file as well my $EF; open $EF, '>', "$LOG_DIR/EXCEPTIONS-$LOG" or confess "unable to open $LOG_DIR/EXCEPTIONS: $!\n"; print $LF scalar(localtime), "\n\n"; print "initializing DB connections\n"; ## Local - Informix Server my $ldbh = yyyDB->Iproduction(); $ldbh->{ChopBlanks} = "true"; $ldbh->{PrintError} = 0; $ldbh->{RaiseError} = 0; # End on Local DB connection ## REMOTE - SQL server my $rdbh = yyyDB->Stest(); $rdbh->{ChopBlanks} = "true"; $rdbh->{PrintError} = 0; $rdbh->{RaiseError} = 0; $rdbh->{AutoCommit} = 1; # End of Remote DB connection # What order will data be extracted form and inserted # into the databases. # We need to be consistand with our inserts, as we are dealing # with a hash with its own sorting rules... So - we set this # to a known sorted order, and thus, have no surprises. my $map = \@data_map; my (@rmaporder, @lmaporder); for my $i ( 0 .. $#data_map ) { push @rmaporder, $map->[$i]->{'rdata'}; push @lmaporder, $map->[$i]->{'ldata'}; } print "\@rmaporder = @rmaporder\n\n"; print "\@lmaporder = @lmaporder\n\n"; print $LF "remote table => $rtable\nremote key => $rkey\n"; print $LF "local table => $ltable\nlocal key => $lkey\n"; # The query to select our data from the remote server my $rfieldlist = join",", @rmaporder; my $rquery = qq{select $rfieldlist from $rtable}; print "Remote query = $rquery\n"; # This is for the local query only my $lfieldlist = join",", @lmaporder; my $lquery = qq{select $lfieldlist from $ltable}; print "Local query = $lquery\n"; # prepare and execute our sql query on the # local DB my $lsth = $ldbh->prepare($lquery) or errpt('FATAL', $ldbh->errstr(), +'NA', $lquery); $lsth->execute() or errpt ( 'FATAL', $ldbh->errstr(), 'NA', $lquery ); # prepare and execute our sql query on the # remote DB my $rsth = $rdbh->prepare($rquery) or errpt('FATAL', $rdbh->errstr(), +'NA', $rquery); $rsth->execute() or errpt ( 'FATAL', $rdbh->errstr(), 'NA', $rquery ); my %RemoteDB; my %LocalDB; # hashes to hold the initial db slurp my %rdb; my %ldb; my $counter = 0; print "Binding row data locally\n"; # Bind columns to hashes # this is faster and more efficient # than alternatives. for ( @lmaporder ) { $counter++; $ldb{$_} = undef; $lsth->bind_col( $counter, \$ldb{$_} ); } print "local counter = $counter\n"; print "Binding row data remotely\n"; $counter = 0; for ( @rmaporder ) { $counter++; $rdb{$_} = undef; $rsth->bind_col( $counter, \$rdb{$_} ); } print "remote counter = $counter\n"; # Fetch the data and hold in these hashes print "Fetching remote data\n"; %RemoteDB = $fetch->($rsth, \%rdb, $rkey); print "fetching local data\n"; %LocalDB = $fetch->($lsth, \%ldb, $lkey); # For DEBUG ONLY... # print "\n=================================================\n\n"; # print "RemoteDB\n"; # print Dumper %RemoteDB; # print "\n=================================================\n\n"; # print "LocalDB\n"; # print Dumper %LocalDB; # print "\n=================================================\n\n"; my @rinsert_keys; my @rinspect_keys; my @rdelete_keys; # Populate arrays with keys # # @rinsert_keys are keys that exist locally # and DO NOT exist remotely. These will always # be inserts for ( keys %LocalDB ) { push @rinsert_keys, $_ if ( ! exists $RemoteDB{$_} ); } # @rinspect_keys represent rows of data # where the keys are the same remotely # and locally - this is a dumping # ground which we will further inspect # the data later on for ( keys %LocalDB ) { push @rinspect_keys, $_ if ( exists $RemoteDB{$_} ); } # @rdelete_keys represent rows of data where # the key exists remotely but not locally # these just get deleted. for ( keys %RemoteDB ) { push @rdelete_keys, $_ if ( ! exists $LocalDB{$_} ); } # Insert Routines print $LF "\nINFO: +++ There are ", scalar(@rinsert_keys), " to insert +\n"; for my $keys (@rinsert_keys) { my @VALUES; for my $count ( 0 .. $#lmaporder ) { push @VALUES, $LocalDB{$keys}{$lmaporder[$count]} #for (@rmaporder); } my $values = join',',@VALUES; my $query = "INSERT into $rtable ($rfieldlist) VALUES ($values)"; print $LF "INSERT: $keys QUERY: $query\n"; my $rsth = $rdbh->prepare($query) or errpt("INSERT", $rdbh->errstr(), $keys, $query); $rsth->execute() or errpt("INSERT", $rdbh->errstr(), $keys, $query +); } # Inspect/Update Routines print $LF "\nINFO: +++ There are ", scalar(@rinspect_keys), " to INSPE +CT\n"; for my $keys (@rinspect_keys) { my @UPDATESET; my @WHERE; my $UPDATE; # Inspect the actual column data, if its different push it onto # @UPDATESET array for later processing for my $count ( 0 .. $#lmaporder ) { print $LF "INSPECT $keys: Local $lmaporder[$count] = $LocalDB{ +$keys}{$lmaporder[$count]}\n"; print $LF "INSPECT $keys: Remote $rmaporder[$count] = $RemoteD +B{$keys}{$rmaporder[$count]}\n"; push @UPDATESET, "$rmaporder[$count] = $LocalDB{$keys}{$lmapor +der[$count]}" unless ( $LocalDB{$keys}{$lmaporder[$count]} eq $RemoteDB{$keys} +{$rmaporder[$count]} ); } $UPDATE = join', ', @UPDATESET; # If there is anything here, goto work... if ( scalar(@UPDATESET) > 0 ) { my $query = "UPDATE $rtable SET $UPDATE WHERE $rkey = $RemoteD +B{$keys}{$rkey}"; print $LF "UPDATE: $keys QUERY=$query\n"; my $rsth = $rdbh->prepare($query) or errpt ("UPDATE", $rdbh->errstr(), $keys, $query); $rsth->execute() or errpt ("UPDATE", $rdbh->errstr(), $keys, $ +query); } else { print $LF "INSPECT: $keys Nothing to do\n"; next; } } # Delete Routines print $LF "\nINFO: +++ There are ", scalar(@rdelete_keys), " to delete +\n"; for my $keys ( @rdelete_keys ) { my $query = "DELETE FROM $rtable WHERE $rkey = $RemoteDB{$keys}{$r +key}"; print $LF "DELETE: $keys\nDELETE: $query\n"; my $rsth=$rdbh->prepare($query) or errpt("DELETE", $rdbh->errstr() +, $keys, $query); $rsth->execute() or errpt("DELETE", $rdbh->errstr(), $keys, $query +); } sub fetch { my $sth = shift; my $dbhash = shift; my $key = shift; my %DB; while ( $sth->fetchrow_hashref ) { # Mangle the Salary table. if ( $rtable eq 'Salary' ) { $dbhash->{'SalaryAmount'} =~ s/(\d+).*/$1/ if ( $dbhash->{'SalaryAmount'} and $dbhash->{'SalaryAmount'} =~ /\d+/ ); $dbhash->{'HourlyRate'} = "0.000" if ( $dbhash->{'HourlyRate'} and $dbhash->{'HourlyRate'} !~ /\d+/ ) ; } # Ensure that ALL data is properly quoted # using the dbi->quote method $DB{$dbhash->{$key}}{$_} = $rdbh->quote($dbhash->{$_}) for keys %$dbhash } return %DB or errpt ( 'FATAL', 'Unable to return data from fetch', + 'NA', 'NA' ); } sub errpt { my $TYPE = shift; my $DB_ERR = shift; my $KEY = shift; my $QUERY = shift; my $ERR = "$TYPE key $KEY: $DB_ERR"; push @ERRS, $ERR; print $LF "$TYPE FAILURE: key = $KEY $DB_ERR\n$QUERY\n"; return 0 unless ($TYPE eq 'FATAL'); return 1; }
for the most part - it works fairly well. I call a module on the command line which provides a data map to translate the column names.
package Labor; use strict; require Exporter; my $VERSION = 1.00; our @ISA = qw(Exporter); our %EXPORT_TAGS = ( 'all' => [ qw( @data_map $rkey $lkey $rtable $lta +ble ) ] ); our @EXPORT_OK = ( @{ EXPORT_TAGS{'all'} } ); our @EXPORT = qw( @data_map $rkey $lkey $rtable $ltable ); ## @data_map, $rkey, $lkey, $rtable, $ltable ## include all necessary information needed to ## map data on local to remote sql server. our @data_map = ( { rdata => 'LaborID', ldata => 'labor_id' }, { rdata => 'Description', ldata => 'description' }, { rdata => 'DirectFlag', ldata => 'labor_type' }, { rdata => 'LaborCat', ldata => 'labor_cat' } ); our $rkey = 'LaborID'; our $lkey = 'labor_id'; our $rtable = 'Labor_ID'; our $ltable = 'lmflaborcd'; 1;
admittedly not the most elegant - but again -its worked where others need to provide data maps easily. Id like to keep the app modular in as much as it would be easy for others to write maps. So, any advice or pointers to information where others have done this before is greatly appreciated.
"That which we persist in doing becomes easier, not that the task itself has become easier, but that our ability to perform it has improved."
  --Ralph Waldo Emerson

Replies are listed 'Best First'.
Re: Advice on syncing heterogeneous database tables
by roboticus (Chancellor) on Feb 06, 2007 at 00:04 UTC

    I recently had a project similar (in some respects) to this one. I didn't anything to do with maintaining the translation tables. So I wrote a module that would read an Excel spreadsheet (via Spreadsheet::ParseExcel) and generates the required code. So they only had to edit the spreadsheet to generate the desired data transform. It worked very nicely, as I required a particular format.


Re: Advice on syncing heterogeneous database tables
by djp (Hermit) on Feb 07, 2007 at 03:45 UTC
    Suggest you consider a commercial product, such as those offered by or These have a free trial period, if the tool does the job then purchase it, it will be cheaper and more reliable than rolling your own.

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://598410]
Front-paged by Arunbear
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others scrutinizing the Monastery: (3)
As of 2022-05-28 19:21 GMT
Find Nodes?
    Voting Booth?
    Do you prefer to work remotely?

    Results (101 votes). Check out past polls.