Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change

Caching DBM hash tie

by Tardis (Pilgrim)
on Jul 16, 2003 at 00:41 UTC ( #274642=perlquestion: print w/replies, xml ) Need Help??
Tardis has asked for the wisdom of the Perl Monks concerning the following question:

Hi folks,

I have a problem. In a nutshell I have a homegrown database application that uses a DBM tie to provide data access. Concurrency is handled by flock, only providing access to a single writer, or many readers at any time.

As the number of people using this application grows, it's not scaling too well. For obvious reasons. We need row level locking, or SQL or something.

To give you an idea of the seriousness, some write operations are taking upward of 30 seconds. That's 30 seconds where no one can use the entire system. That's bad for an interactive app, in anyones language :-)

Yes SQL is on the cards, but it will require a major rewrite, which we aren't really ready for yet. Half-assed SQL conversions (basically a DBM file in SQL) are also an option, but require some API changes.

I thought this morning of a change which would require a very small amount of changes to the API, with plenty of potential gains. It might also be generally useful for the perl community at large.

My idea is to provide a wrapper around a DBM tie that would provide the following extra functionality:

  1. Delayed writes
  2. Read caching
So write operations would always succeed, instantly. Read operations on common keys would be cached in memory (this may not be a win, I imagine the OS caching probably already gets you this for free with a DBM tie).

The delayed writes is a huge win though. If write operations can be deferred until the system is not busy, or until a certain amount of time has elapsed, interactive performance will improve markedly.

I imagine that the simplest implementation would be a module which talks to a named pipe or socket, which has a seperate script running on the other side. A nice failsafe could be that the client side of the thing could fall back to normal tie behaviour if the server side can't be contacted.

Firstly, am I insane to think this could be done well in pure perl?

Secondly, has anyone already done this? I can't see anything on CPAN, but maybe I'm not looking in the right places.

Lastly, I really, really need to be able to preserve my current behaviour where database operations are simply reads or writes to hash keys. I'm moving away from that but it's a slow process. Any solutions which change the way I'd read and write my data significantly are really useless to me at this stage.


Replies are listed 'Best First'.
Re: Caching DBM hash tie
by waswas-fng (Curate) on Jul 16, 2003 at 00:51 UTC
    Take a step back and think about how much time you are willing to invest in plugging the hole vs replacing the dam. I understand that the DBM -> SQL jump seems like a long journey -- but it may well be worth it, now. Instead of looking to extend tied DBM with delayed writes etc, think about how you could scope out DBI access and provide a hash like behavior that you can s/newfunc/oldhash/g in your code. Once the speed issue is resolved then go back and make the apps core more fitting to a true SQL app. IMHO you will end up better in the long run and maybe even spend about the same amount of time with the current fix.

      It's actually not really that much of a bandaid, when you think about it.

      It's a general purpose extension to a simple tie that provides SQL-like concurrency to a standard hash mechanism.

      For what it's worth, the code to convert the app to a half-baked SQL solution is actually written. There is just a fairly high scare factor in actually using it.

      Current code uses the ability to lock the entire database as a way of ensuring data integrity (say during financial operations). That needs to still happen, but if we left all the locks in as they were we'd gain no benefit from having SQL.

      The solution was to use PostgreSQL's SELECT ... FOR UPDATE in appropriate places, along with SET TRANSACTION SERIALIZABLE to lock rows we are about to (or may be about to) change.

      All of these things mean API changes and possibly data integrity issues if not done correctly.

      Breaking the API as little as possible is a real issue here.

        You may be able to get some performace boost with MLDBM::Sync it allows you to batch lock and cache. But any row level locking you try to acomplish still has the same problems you describe above...

Re: Caching DBM hash tie
by cees (Curate) on Jul 16, 2003 at 04:02 UTC

    You mention that you are having scaling problems, but you don't give us any indication of the numbers. This makes it hard to give suggestions since we have no idea how serious the issue is.

    30 seconds to perform a write to a DBM file is insane. How much are you writing? How long is the DBM opened for write access? Are you opening the DBM for write, reading data, performing some calculations and then writing? Do you have more writers than readers? How much data are you talking about? Are the same values consistantly being changed or are changes spread out across your dataset (row level locking won't improve concurency if everyone wants to change the same value).

    Unless you are getting really high hit rates, I would guess that you could optimize your code to remove some of these delays that you are observing. I would definately look into MLDBM::Sync if you are not already using it right now. It may allow you to minimize the time needed for your DBM to be open for writing.

    In other words, I would look at optimizing your code before looking into delayed writes.

    And if all else fails, you could always throw more hardware at your problem...

    - Cees

Re: Caching DBM hash tie
by sgifford (Prior) on Jul 16, 2003 at 04:40 UTC
    I did something like this once to speed up changes to the password file. Many simultaneous writers were causing too much of a delay, much like you're seeing.

    I prototyped a system where, in essence, if the password file was locked the writer would write to a log instead, and whenever a writer got ahold of the lock, it would make its own changes along with any changes in the logfile. It was more complicated than that, using locking and atomic filesystem operations to prevent race conditions and deadlock, but it worked, and provided a huge speedup. Essentially, this log file acted as what database folks call a log and what filesytem folks call a journal. You could get this same effect more simply by having all writers write to a logfile, and running an updater from cron periodically.

    There were two reasons why this worked. First, in our situation it was OK for write requests to be delayed for a few minutes, and for future read requests to not see the writes right away. Otherwise it would have involved modifying all readers to look in the log first, and we would have lost any speed advantages. Second, updating the password file with many changes was about the same speed as for a single change, so batching the requests made a significant speed improvement. If either of these isn't true for your application, you'll find it much harder.

    Perhaps tellingly, we ended up scrapping the prototype and just buying a faster system.

Re: Caching DBM hash tie
by bobn (Chaplain) on Jul 16, 2003 at 02:01 UTC

    I can't help thinking that you are engaged in a very non-trivial endeavor.

    Just for example, you're caching reads while delaying writes. So what happens when someone reads something and it doesn't match what was 'written' but is in delay mode?

    This smacks of a Virtual Memory system to be overlayed on a home-rolled filesystem db. I can't help but think you will incur serious agony on this one. Other posters have said to migrate to a real DB with an adaptation layer to kep the current code going, if need be. I agree.

    --Bob Niederman,

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://274642]
Approved by Zaxo
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others wandering the Monastery: (7)
As of 2018-06-22 07:54 GMT
Find Nodes?
    Voting Booth?
    Should cpanminus be part of the standard Perl release?

    Results (122 votes). Check out past polls.