Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Re^5: Multiple write locking for BerkeleyDB

by sgifford (Prior)
on Apr 24, 2008 at 02:18 UTC ( #682544=note: print w/ replies, xml ) Need Help??


in reply to Re^4: Multiple write locking for BerkeleyDB
in thread Multiple write locking for BerkeleyDB

It really isn't any more complex, it's just that DBI and mysql wrap most of that up for you. Any synchronized solution will have to take some kind of mutex, read and write the counter, then release the mutex, which is all my code does. If you were to use strace or gdb to step through the code involved for both cases (including the mysql server), you'd find that SysV is much simpler; it's just that you have to do more of the coding yourself, because it's not as widely used.

As far as memory tables, I had an error in my benchmark, I was running it with no rows in the table, so no updates were happening at all. I have corrected it in the original post, and also tried a MySQL memory table; SysV is about 4.7 times faster than a memory table. I also added a benchmark using mmap and an atomic add, which is much, much faster than any of the other solutions.

And as to the amount of work MySQL and SysV are doing behind the scenes, I don't see how it matters much unless the OP needs every update written immediately to disk. Otherwise MySQL is just doing extra work that the OP doesn't need.


Comment on Re^5: Multiple write locking for BerkeleyDB
Select or Download Code
Re^6: Multiple write locking for BerkeleyDB
by samtregar (Abbot) on Apr 24, 2008 at 16:32 UTC
    It really isn't any more complex, it's just that DBI and mysql wrap most of that up for you.

    That's an odd definition of complexity you've got there. What would you think of the equivalent solution in assembler? Or constructed using syscall() instead of the shm*() routines? Obviously these would do the same work for you, it's just that GCC "wraps most of that up for you."

    It's interesting to see that SysV semaphores perform pretty well. It doesn't match my experience with SysV shared memory, which was about as slow as just using disk, but I guess that's apples and oranges. And let's not even talk about the arbitrary kernel-level limits on this stuff (how many, how much storage, etc).... It might work great on Linux, but porting to BSD or Solaris where the defaults are much different is a guaranteed pain.

    -sam

      Since the OPs question was about performance, I was talking about computational complexity, the amount of work that the computer has to do. Obviously SysV is more complex to code.

      As far as SysV semaphores and shared memory, it's not that they are particularly fast, they're just faster than performing many very small transactions with a database. It may well be that performance is the same as using a small disk file, which should also be pretty fast, as long as the OS is smart about caching and delayed writes. Look at mmap in my updated benchmarks to see fast. :-)

      Not sure about portability. I'm using all modules that come with Perl and using only the most basic features, so I would expect it to be at least portable across Unix-like systems. Apart from flock I don't know of another native perl mechanism to do an interprocess mutex.

Re^6: Multiple write locking for BerkeleyDB
by dino (Sexton) on Apr 24, 2008 at 16:38 UTC
    Thats a lot of useful information, thanks. I'm a little puzzled however of how to get separate processes to use the same shared memory. With fork I can use some form of handle which is inherited, but how does it work when the writer processes have been started separately?
    I had a look IPC::MM as it allows the creation of shared hashes (not using storable) but it has the above problem (I think).
      See the links in my original post; all share memory between unrelated process. IPC::MM looks very useful, but only shares memory between related processes, at least according to the MM docs at OSSP (I haven't used it myself).

      If you're not interested in doing the low-level work yourself (which will probably give the best performance at the expense of a lot of work), Cache::FastMmap might be helpful. It acts as a fast cache that sits in front of something else, like a database or DB file. You would mostly use the cache, and could periodically flush it out to the database to limit how much data you lose if your machine crashes. The author of this module has some interesting performance information at Comparison of different PERL caching modules.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://682544]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others pondering the Monastery: (7)
As of 2014-07-28 07:53 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (193 votes), past polls