Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

Berkeley DB performance, profiling, and degradation...

by SwellJoe (Scribe)
on Feb 19, 2002 at 14:13 UTC ( #146377=perlquestion: print w/ replies, xml ) Need Help??
SwellJoe has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks,

I've been working for a couple of weeks on a utility to index up to a couple million web objects (just the meta-data, not the actual objects) into a Berkeley DB.

In this program I've used a tied hash for storage of the data and transparently utilizing the db. No explicit file locking is taking place at this point. Each object entry requires at least one FETCH (to see if the object already exists), and one STORE (potentially more, as a parent tree is built for every object--even if the parents don't actually exist in the cache, though they are marked as non-existent). Each entry has a URL, an existence indicator, and a list of child objects.

It is working correctly, but is nowhere near fast enough for the task at hand (which is to follow a running Squid and index the object store). It is quite zoomy up to a few thousand objects, handling over 200 entries per second for the first few thousand. But then performance degrades rapidly, eventually climbing to more than a second per entry. I was considering a rewrite in C, but before embarking on such a drastic and time-consuming journey, I decided to do some profiling. What I found surprised me...so much so that I think I must be misinterpreting the results or (more likely) using the Berkeley DB_File module in ways that are inefficient.

Here's the result of a profile of a ~30000 object run:

Total Elapsed Time = 230.4568 Seconds User+System Time = 227.6368 Seconds Exclusive Times %Time ExclSec CumulS #Calls sec/call Csec/c Name 44.7 101.8 288.02 45981 0.0022 0.0063 main::add_entry 30.5 69.57 68.241 266504 0.0003 0.0003 File::QuickLog::print 10.3 23.62 23.501 25216 0.0009 0.0009 DB_File::FETCH 8.78 19.97 19.737 45981 0.0004 0.0004 DB_File::STORE 5.78 13.15 228.91 20765 0.0006 0.0110 main::process_file 1.47 3.348 3.208 25846 0.0001 0.0001 main::find_parent 1.35 3.077 696.47 130 0.0237 5.3575 main::recurse_dir 0.84 1.919 1.667 45981 0.0000 0.0000 Digest::MD5::md5_hex 0.07 0.160 0.160 1 0.1600 0.1600 main::get_cache_dirs 0.05 0.120 0.498 5 0.0240 0.0995 main::BEGIN 0.03 0.070 0.070 1 0.0700 0.0700 DB_File::DESTROY 0.03 0.070 0.069 149 0.0005 0.0005 File::QuickLog::_datetime 0.03 0.060 0.090 31 0.0019 0.0029 Exporter::import 0.02 0.050 0.139 6 0.0083 0.0232 IO::File::BEGIN 0.02 0.040 0.040 2 0.0200 0.0200 File::Spec::Unix::BEGIN
Now, from this, it is clear that my add_entry routine is taking most of the time with logging functions following close behind. I've already stripped the routine to the bone, and runtime dropped to a quarter its original. I've also moved the database (which never grows larger than 100MB) onto a RAM disk to remove disk latency, which brought the runtime down further. Woohoo! So I've done all I can think of to make things go fast enough. But it is still too slow for real-world use, as indicated by this profile of a run of about 60,000 objects, or twice the size of the previous run:
Total Elapsed Time = 2242.189 Seconds User+System Time = 2120.369 Seconds Exclusive Times %Time ExclSec CumulS #Calls sec/call Csec/c Name 46.6 988.9 2678.0 104115 0.0095 0.0257 main::add_entry 30.4 645.9 644.18 613435 0.0011 0.0011 File::QuickLog::print 14.0 298.3 298.18 54727 0.0055 0.0054 DB_File::FETCH 6.57 139.3 139.01 104115 0.0013 0.0013 DB_File::STORE 1.75 37.14 2117.5 49388 0.0008 0.0429 main::process_file 0.44 9.374 6381.5 242 0.0387 26.370 main::recurse_dir 0.37 7.777 7.586 55566 0.0001 0.0001 main::find_parent 0.26 5.568 5.206 104115 0.0001 0.0000 Digest::MD5::md5_hex 0.01 0.160 0.160 1 0.1600 0.1600 main::get_cache_dirs 0.01 0.120 0.508 5 0.0240 0.1017 main::BEGIN 0.01 0.110 0.109 261 0.0004 0.0004 File::QuickLog::_datetime 0.00 0.080 0.080 1 0.0800 0.0800 DB_File::DESTROY 0.00 0.070 0.069 259 0.0003 0.0003 IO::File::open 0.00 0.060 0.269 6 0.0100 0.0448 File::Tail::BEGIN 0.00 0.050 0.060 4 0.0125 0.0150 IO::Seekable::BEGIN
Everything dealing with the DB is several times slower! Even so we're still able to maintain reasonable speeds at this point...

The problem is that this slowing continues as more objects are entered into the database--yes, I know it has to get slower as the db has to seek more for each FETCH and STORE, but isn't the job of the Berkeley DB to manage the data efficiently enough to where an object retrieval or store doesn't require seconds, which is where we end up by the time the db has a million entries? Squid itself manages to store and retrieve entire objects plus this tiny amount of meta-data at several orders of magnitude faster than I'm able to coax from the BerkeleyDB module.

What is interesting, to me, is that my add_entry routine isn't doing anymore work for an entry at 500,000 than it is for the very first entry. It is pulling it's data and parsing it from a single file of the exact same format every time, and then feeding it into the db-tied hash. The only difference is the size of the database--so the speed reduction isn't in my perl code, but in the Berkeley DB, as far as I can tell.

So I'm posting in hopes that someone here knows how to make the BerkeleyDB perform reasonably with a 50-100MB database (I know...it's not a very big DB, I thought performance would be automatic).

One of my thoughts for possible ways to proceed include using the database only for downtime storage, and using an in-memory hash with no ties for normal operation. This is not an elegant solution, as I then would have to integrate the program that uses this database into this one...leading to the requirement for multiplexing two disparate functions into a single daemon process, so they could efficiently share the data. I don't know if this would be significantly faster, however, if at all.

I've got plenty of memory to work with (virtually unlimited in this scenario--the total meta-index only requires 50-100MB, so my 2GB boxes won't even notice if I keep it in core), so I can trade memory for speed. I've tried increasing the $DB_HASH{cachesize} parameter to 8 and 16MB with literally no measurable result, which is not too surprising since I'm already operating only on RAM. Same result from switching to a $DB_BTREE database.

Another thought was to move the DB writes into their own function, and performing 'batch' entries by calling the function with a reference to a hash of db entries. However, I can't seem to locate any documentation on how to do a batch STORE...also the current code is moderately recursive, making batches difficult (each batch would be 256 objects, and some of those objects may need to update some of the others in the same batch causing a break from the batch mode--I would need to treat the temporary storage the same way I currently treat the persistent storage). Complexity in this case would go way up, so I'm loathe to even begin down that path if there is any other alternative.

Anyway, if someone has experience with DB_File indexing ~1 million objects (tiny objects--maximum of a couple hundred bytes each), with good performance, I'd be much obliged if you'd tell me how you did it.

Thanks for any pointers anyone can provide.

Comment on Berkeley DB performance, profiling, and degradation...
Select or Download Code
Re: Berkeley DB performance, profiling, and degradation...
by merlyn (Sage) on Feb 19, 2002 at 14:39 UTC
    This seems too big for B-DB, and especially the tie-DBM interface. Have you looked at using something like PostgreSQL?

    Or, if you're just logging, go very-lo-tech to scribble things to a log file, and then have a separate "store into the database" process running at a lower priority for when you want to access the data randomly.

    -- Randal L. Schwartz, Perl hacker

      Thanks for your reply, Randal.

      Actually, I'm explicitly avoiding a more complex relational database for the reason that every action is so costly--a simple key:value DB should be much, much faster. In fact, I think MySQL is built on top of the Berkeley DB, so performance of MySQL (which in some circumstances is faster than PostgreSQL) should be at least one abstraction layer slower than Berkeley DB.

      The process in question is the low priority "store into a database" process--we already have logs and when in daemon mode the index will be drawing its data from a File::Tail of the log file. The problem is that it will fall behind very quickly with a large database. There is no downtime in which it can catch up--the indexer must maintain 10-15 entries per second, or it will lose data. Not to mention a full index from scratch (required on occasion) will take two days or more. All very problematic.

      So perhaps changing to another method of entering and retrieving from the database is in order--I'm not married to the tied hash concept, it just seemed the Right Way when I got started. Hmm...Back to the perldocs.

        MySQL itself is not based on BerkeleyDB, but to allow for transactions support, BerkeleyDB and InnoDB table types were added during the 3.23.x series.

        Some speed propaganda on MySQL can be found on their benchmarking page

        DISCLAIMER: I am not advocating MySQL as a solution to this particular problem. I am only attempting to dispel the myth that MySQL is itself based on BerkeleyDB.
Re: Berkeley DB performance, profiling, and degradation...
by perrin (Chancellor) on Feb 19, 2002 at 14:51 UTC
    It does seem very strange that DB_File would get slower as the amount of data grows. It's supposed to be constant, and there are people using it with terabyte-size databases. Anyway, here are some things you might try:

    - Use SDBM_File. It is much faster in some situtations. It has a limited record length though. Take a look at these benchmarks from the MLDBM::Sync documentation.
    - Use BerkeleyDB instead of DB_File. It's newer and may perform better.
    - Use Cache::FileBackend, IPC::MM, or Cache::Mmap instead of a dbm file.

        I say it's surprising because a hash algorithm is supposed to maintain a fairly constant lookup time when you put more data into it. Maybe switching between the hash and BTree options of DB_File would make a difference.

        I have used BerkeleyDB with the 3.x series from Sleepycat pretty extensively. The main advantages it offers are in the area of fancier locking and caching. With a single writer and the data on a RAM disk, these aren't likely to make much difference. It's worth a shot though.

      Interesting! Thanks for the link perrin. This looks very promising...

      Very few of my entries will go over the 1024 byte mark, so this might be the ideal solution for me. I will read up on the other options you've mentioned, as well, and get back with results. I like SDBM_File, and will try it first, since I won't have to change any of the access code, just the build up and tear down.

Re: Berkeley DB performance, profiling, and degradation...
by SwellJoe (Scribe) on Feb 20, 2002 at 10:49 UTC
    For those following my plight with interest or amusement, I'm back with results of the switch to MLDBM::Sync...and those results are, disappointingly, even worse than the DB_File results.

    I tried both explicit and implicit locking (explicit should be faster per the docs). So I'm locking and unlocking for each 256 objects entered into the DB. The difference is small but measurable. Anyway, I suspect my objects cross the line into >1024 byte territory too frequently to gain the benefit of this module over DB_File. I figured them for being smaller than that, but I must be wrong...I think I'll do an object size histogram of the data set for my own enlightenment for the next installment.

    So, next verse, quite similar to the first...I'll be converting everything over to the newer BerkeleyDB module, with further experiments in locking and not locking. More profiling to come. And some day, a solution.

Re: Berkeley DB performance, profiling, and degradation...
by SwellJoe (Scribe) on Feb 20, 2002 at 13:22 UTC
    Further along in the journey, but still haven't made it home:

    I've followed the advice of perrin and crazyinsomniac, and switched to BerkeleyDB (which made things just a touch slower than the original DB_File based program--but not enough to worry over). I then followed Randal's advice to lose the tie interface (my reason for sticking with BerkeleyDB rather than reverting to DB_File is that the direct API for BerkeleyDB is better documented and a bit more complete).

    Dropping the reliance on a tied hash for db entries gave me another 10% speed boost! So, the numbers now look like this for the 'long test' (which is still only a small percentage of the full working set):

    Total Elapsed Time = 2010.809 Seconds User+System Time = 1889.459 Seconds Exclusive Times %Time ExclSec CumulS #Calls sec/call Csec/c Name 41.8 790.8 2354.8 105111 0.0075 0.0224 main::add_entry 32.9 623.2 621.75 618805 0.0010 0.0010 File::QuickLog::print 15.4 291.0 290.92 55361 0.0053 0.0053 BerkeleyDB::Common::db_ge +t 7.20 136.0 135.74 105111 0.0013 0.0013 BerkeleyDB::Common::db_pu +t 1.92 36.27 1886.0 49750 0.0007 0.0379 main::process_file 0.45 8.574 5684.3 242 0.0354 23.489 main::recurse_dir 0.40 7.528 7.361 56262 0.0001 0.0001 main::find_parent 0.31 5.858 5.545 105111 0.0001 0.0001 Digest::MD5::md5_hex 0.01 0.160 0.160 1 0.1600 0.1600 main::get_cache_dirs 0.01 0.160 0.878 5 0.0320 0.1756 main::BEGIN 0.01 0.150 0.509 1 0.1498 0.5087 IO::import 0.01 0.110 0.109 261 0.0004 0.0004 File::QuickLog::_datetime 0.01 0.100 0.120 64 0.0016 0.0019 Exporter::import 0.00 0.080 0.110 6 0.0133 0.0183 IO::Socket::BEGIN 0.00 0.070 0.070 1 0.0700 0.0700 BerkeleyDB::Term::close_e +verything

    So we've shaved 232 seconds off of the total elapsed runtime...a bit more than a 10% gain. 623 seconds of that is logging calls, all but ~100 seconds of which can be stripped out for the final version. More important, of course, for the long haul, is that the sec/call for all database related functions is reduced by about 15%--I'm hoping this means that the degradation of DB performance is significantly lessened by the use of direct calls rather than a tied interface.

    I'm starting a full index run right now, so hopefully by tomorrow night I'll have more to say on the subject. Here's hoping for 15 entries/sec sustainable...

    Randal gets my thanks this time, for providing the most useful suggestion in this round of tweaks.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://146377]
Approved by root
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others rifling through the Monastery: (12)
As of 2014-12-26 14:51 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (171 votes), past polls