Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight

OO Perl & RDBMS Strategy Question

by DamnDirtyApe (Curate)
on Aug 30, 2002 at 02:26 UTC ( #193968=perlquestion: print w/replies, xml ) Need Help??
DamnDirtyApe has asked for the wisdom of the Perl Monks concerning the following question:

Mine is a strategy question relating to object-oriented Perl and relational databases.

I'm interested in how people write classes that represent objects in a database, and to what extent operations on an object are mapped onto the database object. The following paragraph from Building a Large-Scale E-commerce site with Apache and mod_perl : Code Structure got me thinking about this again:

Classes in the Model layer represented business concepts and data, like products or users. These had an API but no end-user interface. They knew nothing about HTTP or HTML and could be used in non-web applications, like cron jobs. They talked to the database and other data sources, and managed their own persistence.

I'm curious as to how people actually implement these systems in the real world. As I see it, there are at least two possibilities:

  • When an object is created, it requires a unique identifier which ties it to some conceptual object in the database. Each operation on the object triggers a call to the database (eg. calling $person->setName( 'Bob' ) would trigger an UPDATE in the database.) None of the data about the object is actually stored in the object, but in the database. This would mean that the objects were always in sync with the database (since they are only really a thin layer over the database), but would mean a large number of database calls.
  • The object has methods like load() and save() that allow it to load the information about the object from the database, then save it back to the database with an UPDATE at a later time. All of the object data is stored in the object, and function calls do not trigger database calls. This would keep the database calls to a minimum, but would mean that objects could be out of sync with the database.

Of course, there may be more ways to do it that I'm just not seeing. How do you do it?

Those who know that they are profound strive for clarity. Those who
would like to seem profound to the crowd strive for obscurity.
            --Friedrich Nietzsche

Replies are listed 'Best First'.
Re: OO Perl & RDBMS Strategy Question
by perrin (Chancellor) on Aug 30, 2002 at 03:07 UTC
    As the article you're quoting from said, we used the latter method in that system. We had load and save methods. Making every method call act on the database is just not necessary in the vast majority of situations, and this seems to be agreed on in the Java camp as well, where the "best practice" for working with EJBs (objects that are stored in a database) is to pass a hash with multiple attribute values rather than call individual methods.

    My advice about modelling objects that are stored in a database is to not even attempt to hide the fact that they're stored in a database. Systems that try to avoid thinking about transactions or replace SQL completely end up with poor performance and compromises. The real benefit from this stuff is being able to tie up pieces of reusable functionality into neat little bundles, so that every time you load a "user" you can just call the same method.

Re: OO Perl & RDBMS Strategy Question
by gav^ (Curate) on Aug 30, 2002 at 02:47 UTC
    A good starting point is POOP (Perl Object-Oriented Persistence) which lists various tools that can help with DB->OO mapping.

    Personally I've done bit of development using Class::DBI which I find to be a nice lightweight mapping.

    This node of mine might be helpful too.


Re: OO Perl & RDBMS Strategy Question
by dws (Chancellor) on Aug 30, 2002 at 03:24 UTC
    I'm curious as to how people actually implement these systems in the real world.

    If you're really serious about building middle-tier OO systems on top of an RDBMS, I recommend reading the manuscript for Martin Fowler's next book, Patterns of Enterprise Application Architecture. (The manuscript will be on-line until the book is published later this year.)

    The examples in the book are in Java or C#, but translate fairly well to Perl.

    The short version is there are choice you need to make about locking models. These choices affect how you manage caches, and how you ensure that the right changes get written back to the database in the right order. Getting a middle-tier system working well requires a lot of careful bookkeeping.

      w3mir to the rescue! $> perl w3mir -r
      ADDED: For the record, Mr. Fowler has written two of the great books on OO: UML Distilled and Refactoring, so this one is of particular interest. Thanks for the link!
Re: OO Perl & RDBMS Strategy Question
by lachoy (Parson) on Aug 30, 2002 at 03:55 UTC

    I strongly disagree with the idea that persistent object storage should be totally transparent. Normally this translates to your first option, syncing the database on every set() method call. Even if you do a bulk set (multiple fields at once) this means you need to do the same for the get(), otherwise you've lost the goal of having each object represent what's in the database right now. It may be necessary in some scenarios but I haven't programmed in any of them, plus it's both a database and network nightmare.

    Instead, push the decision on when to save back on the user of the object. Yes, having a load() (or fetch()) and save() makes serialization visible. Good! The user always knows what's being done with the object and when a save is appropriate. I've taken this tack with SPOPS and it's seemed to work well. And SPOPS has an option to refetch the data for an object from the database, in the case that you have long-lived objects.

    I might as well pipe up with a tangential opinion: I don't think there should be separate 'create' and 'update' calls for objects, just a 'save'. Do you really care which is being done?

    M-x auto-bs-mode

      Giving the prerogative to save or not to the caller has the added benefit of allowing a user to play with an object and not worry about tampering with live data, which can make an API significantly safer and more useful. The drawback to this approach is that you can get into race conditions with other users manipulating the "same" object, so in some cases it may be necessary to do some kind of blocking or locking when it comes time to save your data. For example, check out the object (with a unique ID) before saving by setting a temporary semaphore file or table entry to lock the relevant table rows.

        Exactly: you need some strategy to deal with the problem of overwrites rather than designing a whole system so they don't happen. You can create a unique ID on fetch, you can have a timestamp field automatically update on an insert/update and compare them before updating the object. Plus there's always the strategy of 'last-write wins', which is a legitimate trade-off in many situations.

        M-x auto-bs-mode

Re: OO Perl & RDBMS Strategy Question
by simeon2000 (Monk) on Aug 30, 2002 at 03:07 UTC
Re: OO Perl & RDBMS Strategy Question
by fglock (Vicar) on Aug 30, 2002 at 02:44 UTC

    I cache all accessed object information in memory, and whatever is modified is stored back in the database at the end of the CGI call. It also sets some flags in the database that tell other processes if they have to reload their object data into memory. This works better if you have lots of reads and just a few writes. It doesn't work well for very large databases, unless you have a very smart cache control, or lots of memory.

Re: OO Perl & RDBMS Strategy Question
by Joost (Canon) on Aug 30, 2002 at 10:57 UTC
    How do you do it?

    I've build my own persistent objects solution for a client that had some very strict rules on their system that made most CPAN modules unusable.

    In this system I have one or more engines that correspond to a DBI connection, and other objects that correspond to rows in a table (one table per class).

    You can fetch specific objects from the engine by giving it the classname and id (or constraints that correspond to WHERE queries):

    my $object = $engine->get('Some::Object',$object_id); my @objects = $engine->mget($class,{ name => value, name2 => value2 } +);
    Saving is done by explicitly calling the save() method:
    my $value = $object->get('name'); $object->set('name','other value'); $object->save();
    And objects are passed their engine at creation, so they know where to save their data:
    my $object = Object->new($engine,%attributes);
    This system doesn't try to be transparant, which has two advantages:
    • The programmer knows when updates/insert statements are sent to the database: when the save() method is called. (or he can choose to destroy the object without saving)
    • The programmer can do searches on key => value pairs or WHERE queries, which turns out to be very useful (after all, when you've got a database, you'd better make use of the features it has too).

    I've recently tried Class::DBI and it seems to do it pretty much the same, except that the DBI connection is global for a given class, which I don't like one bit.

    -- Joost downtime n. The period during which a system is error-free and immune from user input.
Re: OO Perl & RDBMS Strategy Question
by BrowserUk (Pope) on Aug 30, 2002 at 03:07 UTC

    I've not used DB's from Perl yet and am not familiar with the modules available, but if I was setting out to write something I think I would (probably) opt for something that used the *nix fork strategy? By which I mean:

    When an object is created, it issues a select for the whole table (or possible several smaller chunks and accumulates) so that it has its data available for immediate read access. Then, when a record (or records if that makes sense) is updated, it writes back to the DB. Of course, this assumes that only this object ever has write access to that part of the DB.

    If that scenario is true, but the updates are sufficiently large to mean that updates cause an unacceptable delay to the front end, then the updates could be queued to an update 'deamon' process. Something like the 'lazy-writeback' used on disk drives? This is not-trivial to get right, and is vunerable to power failures etc. If the DBM supports transactions, some of that can be mitigated, especially if the transactions to the DB were queued via disk, or probably better, in parrellel to disc and removed from disc once the transaction is confirmed.

    Effectively, this is just re-inventing the well-known 3-tier model.

    Well It's better than the Abottoire, but Yorkshire!Perl hacker's do it with less strokes!
      I hope I am not taking the following quote out of context:

      When an object is created, it issues a select for the whole table (or possible several smaller chunks and accumulates) so that it has its data available for immediate read access

      Selecting an entire table into memory becomes a real problem when you're dealing with large tables. Also, selecting the table into memory does not lock it. One important thing to remember about some databases (Oracle for one) is that the DB makes every effort to keep frequently accessed blocks of data in memory so that you don't have to do a physical read/write when doing selects, inserts, updates, deletes. This explains (in part) why Oracle soaks up so much memory and CPU.

      I like chicken.
Re: OO Perl & RDBMS Strategy Question
by snellm (Monk) on Aug 30, 2002 at 14:03 UTC

    My usual apprach is to have value objects that can be loaded or saved to a datastore, ie:

    $datastore->beginTransaction(); my $person = $datastore->getPersonById($personid); ...manipulate $person... $datastore->store($person); $datastore->commitTransaction();

    The main advantages of this is:

    • Simplicity: The CRUD approach (Create, Retrieve, Update, Delete) is well understood and aligns well with SQL
    • Performance: Access to the datastore is kept to a minimum
    • Modularity: The datastore is decoupled from the value objects

    I feel it's usually best to leave transaction management to the datastore, rather than trying to hack together something yourself.

    -- Michael Snell

      Just so that everyone is aware, Oracle has an pseudo-column, "rowid", which is select-able and is unique to any given row in the database. If that psuedo-column is selected upon the edit of the row and thus the original value(s) comapared after any changes, the editing user may be rejected, warned or given the option to save changes.

        And in PostgreSQL you can do that with xmin and ctid. xmin holds the last transaction id to update/update the row and ctid holds the physical location of the tuple in the table. If you use these then you've got to do it with full knowledge of PostgreSQL's multiversion concurrency control and how that affects which tuples are visible when. You can augment that with the ctid = currtid(tableoid, ctid) function which will return the current valid ctid even if your known ctid is old. It also happens that if you can locate a record by ctid then you get lightning fast access since PostgreSQL knows exactly where to seek in the table to find it. I was writing some initial code to use this feature where multiple identity values were used a fallbacks: first ctid, then app object id or whatever else is normally used for identity.

        printf "You are here %08x\n", unpack "L!", unpack "P4", pack "L!", B::svref_2object(sub{})->OUTSIDE

Re: OO Perl & RDBMS Strategy Question
by trs80 (Priest) on Aug 31, 2002 at 14:19 UTC
Re: OO Perl & RDBMS Strategy Question
by dragonchild (Archbishop) on Sep 21, 2004 at 20:03 UTC
    I'm coming in on the tail end of this, but here goes.

    Objects represent business concepts. RDBMS's store business data. The two (rarely) map in a clean 1-to-1 fashion. What chromatic was referring to in that article had to do with the fact that each object knows how to get the data it needs from the database and how to write that data back, should that be needed. That data would be retrieved if and only if it is needed, kept in memory for as short a period of time, then updated as appropriate.

    An important point to remember is that most objects within a web application have a TTL of under a second - just long enough for the request to be serviced. Synchronization is very rarely a concern, even in a mod_perl environment - for that very reason.

    We are the carpenters and bricklayers of the Information Age.

    Then there are Damian modules.... *sigh* ... that's not about being less-lazy -- that's about being on some really good drugs -- you know, there is no spoon. - flyingmoose

    I shouldn't have to say this, but any code, unless otherwise stated, is untested

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://193968]
Approved by Zaxo
Front-paged by jarich
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others lurking in the Monastery: (6)
As of 2017-02-24 05:10 GMT
Find Nodes?
    Voting Booth?
    Before electricity was invented, what was the Electric Eel called?

    Results (351 votes). Check out past polls.