Berkeley DB performance, profiling, and degradation...

SwellJoe has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks,

I've been working for a couple of weeks on a utility to index up to a couple million web objects (just the meta-data, not the actual objects) into a Berkeley DB.

In this program I've used a tied hash for storage of the data and transparently utilizing the db. No explicit file locking is taking place at this point. Each object entry requires at least one FETCH (to see if the object already exists), and one STORE (potentially more, as a parent tree is built for every object--even if the parents don't actually exist in the cache, though they are marked as non-existent). Each entry has a URL, an existence indicator, and a list of child objects.

It is working correctly, but is nowhere near fast enough for the task at hand (which is to follow a running Squid and index the object store). It is quite zoomy up to a few thousand objects, handling over 200 entries per second for the first few thousand. But then performance degrades rapidly, eventually climbing to more than a second per entry. I was considering a rewrite in C, but before embarking on such a drastic and time-consuming journey, I decided to do some profiling. What I found surprised me...so much so that I think I must be misinterpreting the results or (more likely) using the Berkeley DB_File module in ways that are inefficient.

Here's the result of a profile of a ~30000 object run:

Total Elapsed Time = 230.4568 Seconds
  User+System Time = 227.6368 Seconds
Exclusive Times
%Time ExclSec CumulS #Calls sec/call Csec/c  Name
 44.7   101.8 288.02  45981   0.0022 0.0063  main::add_entry
 30.5   69.57 68.241 266504   0.0003 0.0003  File::QuickLog::print
 10.3   23.62 23.501  25216   0.0009 0.0009  DB_File::FETCH
 8.78   19.97 19.737  45981   0.0004 0.0004  DB_File::STORE
 5.78   13.15 228.91  20765   0.0006 0.0110  main::process_file
 1.47   3.348  3.208  25846   0.0001 0.0001  main::find_parent
 1.35   3.077 696.47    130   0.0237 5.3575  main::recurse_dir
 0.84   1.919  1.667  45981   0.0000 0.0000  Digest::MD5::md5_hex
 0.07   0.160  0.160      1   0.1600 0.1600  main::get_cache_dirs
 0.05   0.120  0.498      5   0.0240 0.0995  main::BEGIN
 0.03   0.070  0.070      1   0.0700 0.0700  DB_File::DESTROY
 0.03   0.070  0.069    149   0.0005 0.0005  File::QuickLog::_datetime
 0.03   0.060  0.090     31   0.0019 0.0029  Exporter::import
 0.02   0.050  0.139      6   0.0083 0.0232  IO::File::BEGIN
 0.02   0.040  0.040      2   0.0200 0.0200  File::Spec::Unix::BEGIN
[download]

Now, from this, it is clear that my add_entry routine is taking most of the time with logging functions following close behind. I've already stripped the routine to the bone, and runtime dropped to a quarter its original. I've also moved the database (which never grows larger than 100MB) onto a RAM disk to remove disk latency, which brought the runtime down further. Woohoo! So I've done all I can think of to make things go fast enough. But it is still too slow for real-world use, as indicated by this profile of a run of about 60,000 objects, or twice the size of the previous run:

Total Elapsed Time = 2242.189 Seconds
  User+System Time = 2120.369 Seconds
Exclusive Times
%Time ExclSec CumulS #Calls sec/call Csec/c  Name
 46.6   988.9 2678.0 104115   0.0095 0.0257  main::add_entry
 30.4   645.9 644.18 613435   0.0011 0.0011  File::QuickLog::print
 14.0   298.3 298.18  54727   0.0055 0.0054  DB_File::FETCH
 6.57   139.3 139.01 104115   0.0013 0.0013  DB_File::STORE
 1.75   37.14 2117.5  49388   0.0008 0.0429  main::process_file
 0.44   9.374 6381.5    242   0.0387 26.370  main::recurse_dir
 0.37   7.777  7.586  55566   0.0001 0.0001  main::find_parent
 0.26   5.568  5.206 104115   0.0001 0.0000  Digest::MD5::md5_hex
 0.01   0.160  0.160      1   0.1600 0.1600  main::get_cache_dirs
 0.01   0.120  0.508      5   0.0240 0.1017  main::BEGIN
 0.01   0.110  0.109    261   0.0004 0.0004  File::QuickLog::_datetime
 0.00   0.080  0.080      1   0.0800 0.0800  DB_File::DESTROY
 0.00   0.070  0.069    259   0.0003 0.0003  IO::File::open
 0.00   0.060  0.269      6   0.0100 0.0448  File::Tail::BEGIN
 0.00   0.050  0.060      4   0.0125 0.0150  IO::Seekable::BEGIN
[download]

Everything dealing with the DB is several times slower! Even so we're still able to maintain reasonable speeds at this point...

The problem is that this slowing continues as more objects are entered into the database--yes, I know it has to get slower as the db has to seek more for each FETCH and STORE, but isn't the job of the Berkeley DB to manage the data efficiently enough to where an object retrieval or store doesn't require seconds, which is where we end up by the time the db has a million entries? Squid itself manages to store and retrieve entire objects plus this tiny amount of meta-data at several orders of magnitude faster than I'm able to coax from the BerkeleyDB module.

What is interesting, to me, is that my add_entry routine isn't doing anymore work for an entry at 500,000 than it is for the very first entry. It is pulling it's data and parsing it from a single file of the exact same format every time, and then feeding it into the db-tied hash. The only difference is the size of the database--so the speed reduction isn't in my perl code, but in the Berkeley DB, as far as I can tell.

So I'm posting in hopes that someone here knows how to make the BerkeleyDB perform reasonably with a 50-100MB database (I know...it's not a very big DB, I thought performance would be automatic).

One of my thoughts for possible ways to proceed include using the database only for downtime storage, and using an in-memory hash with no ties for normal operation. This is not an elegant solution, as I then would have to integrate the program that uses this database into this one...leading to the requirement for multiplexing two disparate functions into a single daemon process, so they could efficiently share the data. I don't know if this would be significantly faster, however, if at all.

I've got plenty of memory to work with (virtually unlimited in this scenario--the total meta-index only requires 50-100MB, so my 2GB boxes won't even notice if I keep it in core), so I can trade memory for speed. I've tried increasing the $DB_HASH{cachesize} parameter to 8 and 16MB with literally no measurable result, which is not too surprising since I'm already operating only on RAM. Same result from switching to a $DB_BTREE database.

Another thought was to move the DB writes into their own function, and performing 'batch' entries by calling the function with a reference to a hash of db entries. However, I can't seem to locate any documentation on how to do a batch STORE...also the current code is moderately recursive, making batches difficult (each batch would be 256 objects, and some of those objects may need to update some of the others in the same batch causing a break from the batch mode--I would need to treat the temporary storage the same way I currently treat the persistent storage). Complexity in this case would go way up, so I'm loathe to even begin down that path if there is any other alternative.

Anyway, if someone has experience with DB_File indexing ~1 million objects (tiny objects--maximum of a couple hundred bytes each), with good performance, I'd be much obliged if you'd tell me how you did it.

Thanks for any pointers anyone can provide.

Comment on Berkeley DB performance, profiling, and degradation... Select or Download Code

Replies are listed 'Best First'.

Re: Berkeley DB performance, profiling, and degradation...
by merlyn (Sage) on Feb 19, 2002 at 14:39 UTC

Or, if you're just logging, go very-lo-tech to scribble things to a log file, and then have a separate "store into the database" process running at a lower priority for when you want to access the data randomly.

-- Randal L. Schwartz, Perl hacker

[reply]

Re: Re: Berkeley DB performance, profiling, and degradation...

by SwellJoe (Scribe) on Feb 19, 2002 at 22:05 UTC

Actually, I'm explicitly avoiding a more complex relational database for the reason that every action is so costly--a simple key:value DB should be much, much faster. In fact, I think MySQL is built on top of the Berkeley DB, so performance of MySQL (which in some circumstances is faster than PostgreSQL) should be at least one abstraction layer slower than Berkeley DB.

The process in question is the low priority "store into a database" process--we already have logs and when in daemon mode the index will be drawing its data from a File::Tail of the log file. The problem is that it will fall behind very quickly with a large database. There is no downtime in which it can catch up--the indexer must maintain 10-15 entries per second, or it will lose data. Not to mention a full index from scratch (required on occasion) will take two days or more. All very problematic.

So perhaps changing to another method of entering and retrieving from the database is in order--I'm not married to the tied hash concept, it just seemed the Right Way when I got started. Hmm...Back to the perldocs.

[reply]

Re: Re: Re: Berkeley DB performance, profiling, and degradation...

by trs80 (Priest) on Feb 20, 2002 at 03:34 UTC

benchmarking page

DISCLAIMER

[reply]

Re: Berkeley DB performance, profiling, and degradation...
by perrin (Chancellor) on Feb 19, 2002 at 14:51 UTC

- Use SDBM_File. It is much faster in some situtations. It has a limited record length though. Take a look at these benchmarks from the MLDBM::Sync documentation.
- Use BerkeleyDB instead of DB_File. It's newer and may perform better.
- Use Cache::FileBackend, IPC::MM, or Cache::Mmap instead of a dbm file.

[reply]

Re: Re: Berkeley DB performance, profiling, and degradation...

by crazyinsomniac (Prior) on Feb 19, 2002 at 15:18 UTC

There is an alternative interface, BerkeleyDB.pm, which is distributed with the latest versions of Berkeley DB, the ones which support terabytes and terabytes of data.

merlyn also hinted at the alternative interface ... it is a know fact that tie is somewhat slow.

I haven't been able to try it yet, due to various problems I've been having w/my compiler setup on winblows ... I'd be interested to see how drastic the improvements are ;)

______crazyinsomniac_____________________________
Of all the things I've lost, I miss my mind the most.
perl -e "$q=$_;map({chr unpack qq;H*;,$_}split(q;;,q*H*));print;$q/$q;"

[reply]

Re: Re: Re: Berkeley DB performance, profiling, and degradation...

by perrin (Chancellor) on Feb 19, 2002 at 18:29 UTC

I have used BerkeleyDB with the 3.x series from Sleepycat pretty extensively. The main advantages it offers are in the area of fancier locking and caching. With a single writer and the data on a RAM disk, these aren't likely to make much difference. It's worth a shot though.

[reply]

Re: Re: Re: Re: Berkeley DB performance, profiling, and degradation...

by SwellJoe (Scribe) on Feb 19, 2002 at 22:46 UTC

Re: Re: Berkeley DB performance, profiling, and degradation...

by SwellJoe (Scribe) on Feb 19, 2002 at 22:35 UTC

Very few of my entries will go over the 1024 byte mark, so this might be the ideal solution for me. I will read up on the other options you've mentioned, as well, and get back with results. I like SDBM_File, and will try it first, since I won't have to change any of the access code, just the build up and tear down.

[reply]

Re: Berkeley DB performance, profiling, and degradation...
by SwellJoe (Scribe) on Feb 20, 2002 at 13:22 UTC

I've followed the advice of perrin and crazyinsomniac, and switched to BerkeleyDB (which made things just a touch slower than the original DB_File based program--but not enough to worry over). I then followed Randal's advice to lose the tie interface (my reason for sticking with BerkeleyDB rather than reverting to DB_File is that the direct API for BerkeleyDB is better documented and a bit more complete).

Dropping the reliance on a tied hash for db entries gave me another 10% speed boost! So, the numbers now look like this for the 'long test' (which is still only a small percentage of the full working set):

Total Elapsed Time = 2010.809 Seconds
  User+System Time = 1889.459 Seconds
Exclusive Times
%Time ExclSec CumulS #Calls sec/call Csec/c  Name
 41.8   790.8 2354.8 105111   0.0075 0.0224  main::add_entry
 32.9   623.2 621.75 618805   0.0010 0.0010  File::QuickLog::print
 15.4   291.0 290.92  55361   0.0053 0.0053  BerkeleyDB::Common::db_ge
+t
 7.20   136.0 135.74 105111   0.0013 0.0013  BerkeleyDB::Common::db_pu
+t
 1.92   36.27 1886.0  49750   0.0007 0.0379  main::process_file
 0.45   8.574 5684.3    242   0.0354 23.489  main::recurse_dir
 0.40   7.528  7.361  56262   0.0001 0.0001  main::find_parent
 0.31   5.858  5.545 105111   0.0001 0.0001  Digest::MD5::md5_hex
 0.01   0.160  0.160      1   0.1600 0.1600  main::get_cache_dirs
 0.01   0.160  0.878      5   0.0320 0.1756  main::BEGIN
 0.01   0.150  0.509      1   0.1498 0.5087  IO::import
 0.01   0.110  0.109    261   0.0004 0.0004  File::QuickLog::_datetime
 0.01   0.100  0.120     64   0.0016 0.0019  Exporter::import
 0.00   0.080  0.110      6   0.0133 0.0183  IO::Socket::BEGIN
 0.00   0.070  0.070      1   0.0700 0.0700  BerkeleyDB::Term::close_e
+verything
[download]

So we've shaved 232 seconds off of the total elapsed runtime...a bit more than a 10% gain. 623 seconds of that is logging calls, all but ~100 seconds of which can be stripped out for the final version. More important, of course, for the long haul, is that the sec/call for all database related functions is reduced by about 15%--I'm hoping this means that the degradation of DB performance is significantly lessened by the use of direct calls rather than a tied interface.

I'm starting a full index run right now, so hopefully by tomorrow night I'll have more to say on the subject. Here's hoping for 15 entries/sec sustainable...

Randal gets my thanks this time, for providing the most useful suggestion in this round of tweaks.

[reply]
[d/l]

Re: Berkeley DB performance, profiling, and degradation...
by SwellJoe (Scribe) on Feb 20, 2002 at 10:49 UTC

I tried both explicit and implicit locking (explicit should be faster per the docs). So I'm locking and unlocking for each 256 objects entered into the DB. The difference is small but measurable. Anyway, I suspect my objects cross the line into >1024 byte territory too frequently to gain the benefit of this module over DB_File. I figured them for being smaller than that, but I must be wrong...I think I'll do an object size histogram of the data set for my own enlightenment for the next installment.

So, next verse, quite similar to the first...I'll be converting everything over to the newer BerkeleyDB module, with further experiments in locking and not locking. More profiling to come. And some day, a solution.

[reply]


Welcome to the Monastery
	PerlMonks