Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?

Re: Berkeley DB performance, profiling, and degradation...

by perrin (Chancellor)
on Feb 19, 2002 at 14:51 UTC ( #146386=note: print w/replies, xml ) Need Help??

in reply to Berkeley DB performance, profiling, and degradation...

It does seem very strange that DB_File would get slower as the amount of data grows. It's supposed to be constant, and there are people using it with terabyte-size databases. Anyway, here are some things you might try:

- Use SDBM_File. It is much faster in some situtations. It has a limited record length though. Take a look at these benchmarks from the MLDBM::Sync documentation.
- Use BerkeleyDB instead of DB_File. It's newer and may perform better.
- Use Cache::FileBackend, IPC::MM, or Cache::Mmap instead of a dbm file.

  • Comment on Re: Berkeley DB performance, profiling, and degradation...

Replies are listed 'Best First'.
Re: Re: Berkeley DB performance, profiling, and degradation...
by crazyinsomniac (Prior) on Feb 19, 2002 at 15:18 UTC
      I say it's surprising because a hash algorithm is supposed to maintain a fairly constant lookup time when you put more data into it. Maybe switching between the hash and BTree options of DB_File would make a difference.

      I have used BerkeleyDB with the 3.x series from Sleepycat pretty extensively. The main advantages it offers are in the area of fancier locking and caching. With a single writer and the data on a RAM disk, these aren't likely to make much difference. It's worth a shot though.

        This was my assumption as well (that lookups should be roughly constant at some point). But clearly it is not so.

        I've already tried switching to BTREE with no measurable result--I think having the db in RAM nullifies all of the tweaks that are available (like cachesize, etc.).

        One thing I have thought of, which might be helpful, is that I already have a hash value which is my key in the database. As I understand it, the Berkeley DB then creates a new hash derived from my key to store the object. Any chance I could use my own hashes as record numbers or similar? (The hash I have for a key is a 32 byte MD5, which matches the Squid hash key for a given object.) Would avoid the key generation part of the STORE and FETCH. Might not be a benefit though...Will worry more about it if SDBM_File doesn't fix my problems.

Re: Re: Berkeley DB performance, profiling, and degradation...
by SwellJoe (Scribe) on Feb 19, 2002 at 22:35 UTC
    Interesting! Thanks for the link perrin. This looks very promising...

    Very few of my entries will go over the 1024 byte mark, so this might be the ideal solution for me. I will read up on the other options you've mentioned, as well, and get back with results. I like SDBM_File, and will try it first, since I won't have to change any of the access code, just the build up and tear down.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://146386]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others cooling their heels in the Monastery: (11)
As of 2019-10-23 12:24 GMT
Find Nodes?
    Voting Booth?