http://www.perlmonks.org?node_id=60074
Category: DBI
Author/Contact Info grinder
Description:

Fellow Monks,

in the recent weeks I have been working with DBI.pm to deal with moving data from one database on one machine to another running on another machine. With DBI this was actually a lot of fun. The code has evolved quite a bit over time, as I sought to strip the problem back to its essence, and make it as clear as possible to see what is happening.

Basically what's going on is that a select is performed, and due to the radically different database designs, multiple inserts are required to transfer the information over. So what I had to do was to create an elegant way of dealing with insertions.

For the purpose of the discussion, let's just say that $db is an initialised DBI object and $ss is a statement thingy. I might select some information with

select foo, bar, baz, rat from sometable

I then have to insert that info across a number of different tables. The actual SQL might look like

insert into t1 (this, that) values (foo, bar)
insert into t2 (him, her, it) values (foo, baz, rat)
insert into t3 (ping, pong, thing) values (foo, rat, 17)

Where foo, bar et al are of course the values returned from the select. The code to do the above looks like this

while( my $d = $ss->fetchrow_hashref() ) { insert 't1', { this => $d->{foo}, that => $d->{bar}, }; insert 't2', { him => $d->{foo}, her => $d->{baz}, it => $d->{rat}, }; insert 't3', { ping => $d->{foo}, pong => $d->{rat}, thing => 17, }; }

That's about as far as I can strip the problem domain back down to the essentials. But wait, there's more. After I had written about 12 scripts on this principle, two problems arose. Firstly, the thing was just too damned slow, because I was using $db->do(). I was building up a string that contained the insert statement in longhand, with something like

my $sql = "insert into $table(" . join(',' => sort keys %$data) . ')values(' . join( ',' => @{$data{sort keys %$data}} ) . ')';

or something along those lines (the original code has since be thrown away), using a hash slice, and sorting the keys to make sure the values lined up correctly, which, upon reflexion maybe overly paranoid, but hey, it works. The do had to go, I had to start thinking about using placeholders and prepare and execute. And Devel::DProf showed me that the two sorts per insert added a non-trivial amount of execution time as well.

So the thing was to do was to check the table name and the first time it wa seen, to generate an insert statement with placeholders and stash that away in a hash.

insert into t1 (this, that) values (?, ?)
insert into t2 (him, her, it) values (?, ?, ?)
insert into t3 (ping, pong, thing) values (?, ?, ?)

And since I had a hash hanging around, I figured I could also stash away a reference to an array of the sorted SQL column names away, and do away with several million C<sort>s. I also ran into the problem that your standard DBD::Sybase doesn't allow multiple $sss per $db, so I created one DBI object per table being inserted into.

This worked well for a while: total execution time fell down to a couple of hours. But then, after a while I had to do something like

insert 't2', { him => $d->{foo}, her => $d->{baz}, it => $d->{rat}, }; insert 't2', { him => $d->{foo}, her => 5, };
which meant that my carefully prepared insert statement blew up on the second call. So I fixed that with an ugly hack to distinguish the different ways of inserting into a table, on the few occasions it was required
insert 't2', { him => $d->{foo}, her => $d->{baz}, it => $d->{rat}, }, 'want-it'; insert 't2', { him => $d->{foo}, her => 5, }, 'no-it';
So with that, I can finally show the snippet of code which is the point of this writeup, and that's the insert subroutine

I think the AUTOCOMMIT and COMMIT_BATCH constants are pretty easy to figure out.

sub insert {
  my( $table, $data, $key ) = @_;
  my $t = $key ? "$table-$key" : $table;

  if( !defined $dbo->{$t} ) {
    # first time through, set some things up
    my $db = DBI->connect(DB_IN_SRV, DB_IN_DB, DB_IN_PW,
      {AutoCommit => AUTOCOMMIT, PrintError => 0})
        or die "Couldn't connect to database @{[DB_IN_SRV]} "
          . DBI->errstr;
    my $s = [sort keys %$data];
    my $sql = "insert into $table("
      . join(',' => @$s)
      . ')values('
      . join( ',' => ('?') x @$s )
      . ')';
    my $ss = $db->prepare($sql)
      or die "[$table/$t] prepare $sql\n" . $db->errstr . "\n";
    @{$dbo->{$t}}{qw/DB STH SQL SORT COUNT ERR/ } =
      ($db, $sth, $sql, $s, 0, 0);
  }

  ++$dbo->{$t}{COUNT};
  if( !$dbo->{$t}{STH}->execute(
      @$data{ @{$dbo->{$t}{SORT}} } ) ) {
    warn $dbo->{$t}{COUNT}, ': ',
      $dbo->{$t}{SQL}, "\n",
      join( "\n", @$data{ @{$dbo->{$t}{SORT}} } ), "\n",
      $dbo->{$t}{DB}->errstr, "\n\n";
    ++$dbo->{$t}{ERR};
    # give up quickly if things are going worng
    if( $dbo->{$t}{COUNT} > 10
        and $dbo->{$t}{ERR} > 0.6 * $dbo->{$t}{COUNT} ) {
      die "Giving up.\n";
    }
  }
  AUTOCOMMIT
    or $dbo->{$t}{COUNT} % COMMIT_BATCH
    or $dbo->{$t}{DB}->commit;
}