Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

Auto-Increment and DBD Agnosticism

by skyknight (Hermit)
on Jun 22, 2004 at 18:39 UTC ( #368814=perlquestion: print w/ replies, xml ) Need Help??
skyknight has asked for the wisdom of the Perl Monks concerning the following question:

As I mentioned earlier today in A First CPAN Odyssey, I am presently in the throes of polishing up a module that I wish to publish on CPAN. It is a module that provides an object oriented framework for the representation of objects and links stored within a SQL database. While for my own purposes writing solely to the MySQL DBD is satisfactory, I'm striving to make the module as general as possible, a key component of that being DBD agnosticism. I'd like very much for any database that works with DBI to operate smoothly with my module.

To this end, one of my main concerns is dealing with the retrieval of auto-increment values after issuing an insert into a table. I am using surrogate primary keys for all linkages between objects, said key being an auto-increment column in every "object" table, and as such I will probably often want to know the value of this column after doing an insert so I can subsequently populate a "link" table with it.

Obtaining this value with the MySQL DBD is trivially easy. After executing an insert query, the statement handle object contains a field called mysql_insertid which one can grab without any trouble. Since this module is under development in a MySQL environment, I have thus far punted on generality and used the mysql_insertid field as a temporary kludge. Now, however, as I am wrapping things up, I need to take off this bandage and come up with a real solution.

The obvious and simple fix is simply to perform a select right after the insert, calling up the auto-increment value that way. This mostly works, though I am concerned that it is going to result in my code executing twice as many SQL statements, and there is the further danger that the saved row is not unique. On the other hand, I wonder what kind of overhead already exists as the result of the mysql_insertid's availability. Is DBD::mysql already transparently doing something along these lines, or is it somehow optimized?

The best compromise I can muster right now involves making the post-insert querying be "on-demand". Should a Perl object in my framework be saved to the underlying database, it would have a Boolean field called "saved" marked as true. Then when a user invoked the get_pk_value method (the method for retrieving the primary key), it would issue a select statement to ascertain the primary key if it saw that the "saved" flag was true but it did not have the primary key value cached. This still leaves a bitter taste in my mouth, but at least it would avoid the superfluity of executing a second statement that wasn't going to be of any value.

As a hybrid solution, I could make the object storage code semi-aware of the DBD in use, having it utilize the mysql_insertid field of the statement handle when it can, and falling back on the above-described solution when using other DBDs.

So, what I would really like to know... Does some better, less hackish way exist to accomplish that which I have described, or am I stuck dealing with it in some way along the lines of what I have delineated herein?

Comment on Auto-Increment and DBD Agnosticism
Re: Auto-Increment and DBD Agnosticism
by hardburn (Abbot) on Jun 22, 2004 at 18:46 UTC

    You might want to look at how Class::DBI solves this problem.

    My question is, do you have a specific reason for creating something new instead of using Class::DBI?

    ----
    send money to your kernel via the boot loader.. This and more wisdom available from Markov Hardburn.

      The simple answer would be that "I hope to create something better". I am aware of the admonition "don't reinvent the wheel, unless you plan on inventing a better wheel". I hope that ultimately I will have created a better wheel, though I'm sure there will be much work involved, and am aware that ultimately Class::DBI may retain its dominance. Despite the possibility of my "failure", though, I would argue that there is much value to the existance of competing modules, as they will each force the other to adapt.
Re: Auto-Increment and DBD Agnosticism
by castaway (Parson) on Jun 22, 2004 at 19:05 UTC
    Have you looked at last_insert_id in the DBI documentation? This is a generic implemenation of what you are already using, available since DBI v1.38.

    C.

      It's not clear to me how well supported that is. I have heard that it isn't yet a bullet proof way to go about things.

      Update: I just looked at the DBI that came installed on my installation, a Fedora Core 1 distro, and the DBI that comes installed on it is 1.37! :-( I am quite leery about relying on a version that even I don't have installed by default. I wonder how reasonable it is to stipulate that others upgrade their DBI to 1.38. *sigh*

Re: Auto-Increment and DBD Agnosticism
by blokhead (Monsignor) on Jun 22, 2004 at 19:14 UTC
    DBI provides a last_insert_id method as of v1.38, but I haven't had luck in getting it to work, at least with my DBD::mysql version (2.9003).

    I applaud your goal of DBD-agnosticism. But it's really hard to be DBD-agnostic without also being DBD-specific. Look at all the big cross-platform DB modules out there (Class::DBI for instance). What they do is have all DBD-specific features subclassed out. That's the "right" way to do it. In my own database module, I had to resort to something similar for getting the last-insert-id, as well as schema detection. It goes something like this:

    package Foo; sub new { my $class = shift; my $dbh = ... my $subclass = "$class::$dbh->{Driver}{Name}"; eval "use $subclass; 1;" or croak "$dbh->{Driver}{Name} is unsupported"; bless { dbh => $dbh }, $subclass; } sub dbh { $_[0]->{dbh}; }
    Then you have a few small DBD-specific modules implementing insert_id:
    package Foo::mysql; sub insert_id { my $self = shift; $self->dbh->{mysql_insertid}; } ########### package Foo::Pg; ## notice how Pg requires the table and column, while the ## other DBDs ignore these args sub insert_id { my ($self, $table, $col) = @_; my $id; eval { my $seq = $table . '_' . $col . '_seq'; my $sth = $self->dbh->prepare( "select currval('$seq')" ); $sth->execute; ($id) = $sth->fetchrow_array; $sth->finish; 1; } or die; return $id; } ########## package Foo::SQLite; sub insert_id { my $self = shift; $self->dbh->func('last_insert_rowid'); }
    (In my own module, I don't actually bless the objects into the subclass, but I do store the name of the DBD subclass and call insert_id as a class method when I need it) Subclassing is clean and easy to maintain, plus it encourages me to stretch support to as many DBDs as possible.

    blokhead

      Thanks, suggesting the paradigm of subclassing to deal with specific DBDs is helpful. I feel like that probably is the right way to deal with cross-DBD inconsistencies. I am already using this for some constructs in my code... For example, I have an abstract class called SQL::Conf that serves as a base for subclasses that parse the connection configurations for specific databases. There is a subclass of it called SQL::Conf::MySQL that parses .my.cnf files, and when you invoke the create_handle method of SQL::DBH it instantiates the appropriate SQL::Conf subclass based on the RDBMS that you specify either as an argument to the method, or as an environment variable. Right now, however, I'm not using such a paradigm for database schema detection... I guess I'm probably going to get burned when trying other databases. :-( Presently I have a SQL::Table class that takes just a handle and a table name, and issues a "describe table foo" query, and populates itself with the returned information. From what you've said, I suppose that that implementation will break when I try something other than MySQL. Ho-hum...
Re: Auto-Increment and DBD Agnosticism
by lachoy (Parson) on Jun 22, 2004 at 20:27 UTC
      Thanks, that is quite helpful. I remember crossing this bridge before now, and I recall there being a troll under it.
Re: Auto-Increment and DBD Agnosticism
by mpeppler (Vicar) on Jun 23, 2004 at 06:50 UTC
    As others have mentioned, DBI has the last_insert_id() method since 1.38. However, I discovered as I was trying to implement this for DBD::Sybase that it is broken up to and including the current 1.42 release. The bug is that a flag in the DBI dispatcher is incorrect so that DBI never calls the underlying DBD's implementation of last_insert_id(). In addition, last_insert_id() will only work if the underlying DBD has implemented it.

    As you have discovered getting this value from the various database server engines requires some very different approaches. The most generic approach is to simply do a SELECT max(idvalue) FROM table right after the insert. If you are running with AutoCommit OFF, and if the server supports transactions then this should return the correct value, as long as you COMMIT after the SELECT max(). Otherwise this is open to a race condition in the case of multiple inserts to that same table.

    And FYI - for Sybase and MS-SQL the last inserted id (IDENTITY value) can be retrieved with a SELECT @@identity.

    Michael

Re: Auto-Increment and DBD Agnosticism
by awwaiid (Friar) on Jun 23, 2004 at 18:11 UTC
Re: Auto-Increment and DBD Agnosticism
by stvn (Monsignor) on Jun 23, 2004 at 20:23 UTC
    The obvious and simple fix is simply to perform a select right after the insert, calling up the auto-increment value that way. This mostly works, though I am concerned that it is going to result in my code executing twice as many SQL statements, and there is the further danger that the saved row is not unique.

    This method is not safe in multi-process environments at all (CGI, mod_perl, etc) as the INSERT followed by the SELECT is not an atomic action. I would stay away from this kludge as it will only get you in trouble down the road.

    On the other hand, I wonder what kind of overhead already exists as the result of the mysql_insertid's availability. Is DBD::mysql already transparently doing something along these lines, or is it somehow optimized?

    No, there is no extra SQL call by DBD::Mysql (as best I know without looking at the source that is). I believe that in the client library for MySQL (which DBD::MySQL uses) the 'last_insert_id' is a function/variable/environmental-parameter of some kind. So its actually optimized in MySQL itself.

    *snip a bunch of stuff* This still leaves a bitter taste in my mouth, but at least it would avoid the superfluity of executing a second statement that wasn't going to be of any value.

    That whole idea leaves a bitter taste in my mouth as well. Anything that is assuming you don't have duplicate data (aside from the primary key) will get you in trouble.

    So, what I would really like to know... Does some better, less hackish way exist to accomplish that which I have described, or am I stuck dealing with it in some way along the lines of what I have delineated herein?

    Subclassing, subclassing, subclassing. There is no such thing as truely "vanilla" SQL (at least not that I have seen and which is usable in a real world situation), so the idea of complete DBD (therefore database) agnosticism is a fantasy. There really exists only one elegant solution, which is to subclass.

    Also, keep in mind that all DBDs are not alike. Just because the DBI specification says it should do "A", doesn't mean the driver author has to actually implement that (or even better, have time to implement it). You would be surprised how minimal a DBD driver can be, and still "work" with DBI.

    -stvn

      Well, what I was envisioning would be safe in a multi-process environment. It would not be selecting the max id of the table, but rather selecting the id of the row that was inserted by using as conditions the values of the fields of the object that was just saved. I didn't like it as a fix without the imposition that some other key be specified as unique, and furthermore it worries me because of the overhead of the extra query, particularly if it means piping a lot of data back to the server in the form of the conditions.

      I upgraded to DBI 1.42, and tried using the last_insert_id() method, but alas it refused to do anything other than return undef. I don't know what is up with that... My version of DBD::mysql is one away from the latest version, so I tried installing the new one on the suspicion that this would fix it since the latest version of DBD:mysql was released right after the version of DBI that incorporated the last_insert_id() method, but the compile barfed. :-(

        Well, what I was envisioning would be safe in a multi-process environment. It would not be selecting the max id of the table, but rather selecting the id of the row that was inserted by using as conditions the values of the fields of the object that was just saved

        Even that is not safe, since you are relying on the fact that all your data (aside from the primary key) will be unique. This is okay if you have other uniquness constraints on the table, but otherwise you will eventually run into a problem with this. Most systems of this nature rely on the primary key as the natural unique identifier, I would really suggest you stick with that, and subclass for different DBDs.

        -stvn

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://368814]
Approved by bassplayer
Front-paged by grinder
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others studying the Monastery: (8)
As of 2014-10-31 08:01 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    For retirement, I am banking on:










    Results (215 votes), past polls