http://www.perlmonks.org?node_id=147167


in reply to Performance quandary

Some obvious answers you probably covered first:
- get more user time from cpu
- get a real db and maybe offload to another machine
- concentrate on raw writes within a guaranteed time rather than doing processing while writing, etc.
But it seemed like you must have something pathological happening if a 200/sec routine drops below 10/sec.

I'm curious, what happens when you don't use a DB at all and just use the current algorithm with an ordinary in memory hash?

The first thing to check is that you are not using 5 year old software. You should build DB_File against Berkeley version 4 or, better yet use the BerkeleyDB module. You still seem not to get all the functionality of Berkeley DB 4 as provided by Sleepycat. That said, see below some other things you could do including using a database like Mysql or PostgreSql which also provide some neat tricks.

There are new tables/functions in Mysql that might help. I wouldn't have said Mysql except this was brought to my attention when I had the great luck of having drinks with their David Atmark in Tokyo (Open Source Developers Conference at VA) two weeks ago. Anyway, It sounds like your extra processing and maybe some bugs in old BDB may push N log N so much closer to an exponential curve that it becomes unfeasible for million object tables at your current CPU level.

- you don't need to do locking because BDB will do page-level, InnoDB row-level locking for you.
- IN MEMORY table spec (maybe not such a big deal)
- use placeholders, prepare once exec many times
- batch adds (batch selects/inserts) via API, or as text file import from a file or pipe
- create a number of writer processes with a preforked server. This way you can add more than one at a time, and row locking could block as necessary.
- or, save up the adds that need to recurse and do them together. don't really know what you want.
- pretune table to data structure/size. BDB makes extra rows for you but needs to know size first. So make it give you a lot more extras.
- multiple indices. In particular, InnoDB maintains an insert buffer that is said to speed up similar databases 15 times. This reduces seeks created by the random distribution of secondary keys. Your current algorithm may actually be doing a seek every single time whereas a real DB caches primary index significantly.
- consider a table type like MyISAM that allows you to do concurrent selects and inserts, if you think this is slowing you down with Mysql.
- Use profiling/optimizing built into Mysql or other DBs
- You might be interested in trying an extremely cool database called Firebird (this is Interbase, bug fixed and open sourced). Actually if you use this, then Firebird 2 (now C++) will be out very soon and give you even more boost. It is very lightweight, dependable (used in aerospace for a long time), and self-tuning. Firebird/Interbase includes "Declarative Referential Integrity Constraints" which means it will synchronize parent-child relationships between tables. Download manual from Borland.

Other things that might help
- DB will keep a log for you anyway. If you really want log, make it a DB table maybe. Shouldn't be taking so much time. Are you autoflushing to disk?
- move processing into triggers, in Perl, Java, C, or PL/SQL
- more complex structures, multiple tables, more space for time tradeoffs
- Throw away all those joins and splits. No more strings, throw away all that "http" and "Inserting parent" etc. Too much Perl going on.

Some reading materials, might help you.

Insert speed mods - confirmed what I thought, that "The size of the table slows down the insertion of indexes by N log N (B-trees)." See tuning link.
InnoDB

Hope this helps. It would be easier to understand what is happening if we could see exactly what the data looks like. Can you define what these trees are anyway? Is it just two levels, with a parent that is a domain name, plus a number (max is what?) of children? Or are you translating paths or XML info into parent-child relationships? Could this data be described as a 2-dimensional (or n-dim) table?

Replies are listed 'Best First'.
Re (tilly) 2: Performance quandary
by tilly (Archbishop) on Feb 24, 2002 at 20:43 UTC
    That MySQL comment bothers me.

    With a properly designed BTree, the access time should scale logarithmically with the size of the table. The n*log(n) should be the overall time to build up the whole table (n operations, each of which is log(n)).

    Typical data sets range between, say, 1000 and 10**9 rows. Over that range log(n) only changes by a factor of 3, so for all intents and purposes it is just some constant. In real databases the access time is dominated by the time to seek to disk, and so (as my link in my previous post pointed out) the locality of reference of a BTREE can be a good win over hashing because it is more likely to hit pages in cache. Sure, asymptotically that log(n) is infinitely worse than any possible constant, but real life doesn't tend to reach that asymptotic nightmare. (Besides which BTREEs have a good worst-case guaranteed performance. Hashes don't.)

    On to other issues.

    Using a RAM disk does a lot to remove filesystem overhead. That should, among other things, catch most of the possible speed wins from extracting parallelism in commits (through insert buffers and other tricks). Come to think of it, most of the rest of the filesystem overhead can be removed just by naming your file undef, which tells Berkeley DB that it is dealing with an in memory database, removing the OS's filesystem abstraction layers from the picture.

    You do have a good point about doing too much Perl in the processs. Personally I strongly suspect that just changing to a better data structure will win the day. Alternately, if you can, using native hashes is better than pulling in Berkeley DB. But given Perl's memory overhead, I would be concerned about hitting the 32-bit addressing barrier, even if you have enough RAM in the machine to do it. (A wall we will be hearing about more and more often as time goes by...)

      it seems to me you are wanting an in-memory data structure, but you are tying the data to a database structure. if you are going to go this route, i would suggest looking into the index optimizations of BDB as tilly suggested with an 'undef' file, MySQL HEAP tables, and pure-perl structures like DBD::AnyData. A good index system will make or break your seek time, and you are seeking alot aren't you? i'm not sure how much overhead a connection to BDB or MySQL would add, but if it is significant DBD::AnyData in-memory table may be less by virtue of running as a perl module and not connecting to anything at all.

      if you are doing concurrent reads/writes the locking is of importance to you as well. you probably want to avoid anything that locks more than just the relevent rwo. if you can guarentee integrity another way, or 100% accuracy is unnecessary, maybe you can avoid locking the table at all.

      if you aren't doing concurrent processing, since you are farming the data-storage out to BDB you might be able to get away with (A) building the trees you have to make in a child process so several can be done concurrently, and (B) splitting your tables so you can do concurrent reads/writes. this probably won't get you much, but it would allow you to avoid some locking problems if your tables do lock, and it would pottentially allow you to run a separate DB and/or tree building on a second proceesor/machine to share the load.

      you also want to avoid anything resembling transaction logging as that inevitably is designed to trade speed for data integrity. i don't know BDB well, but in MySQL you would want to avoid InnoDB tables just for this reason.
Re: Re: Performance quandary
by SwellJoe (Scribe) on Feb 26, 2002 at 14:24 UTC
    Thanks for your comments mattr,

    To answer your first few comments:

    - get more user time from cpu
    Not an option. We have an installed base of servers--this is a fixed resource. And this task is sharing with a CPU hungry daemon that always gets priority (Squid).

    - get a real db and maybe offload to another machine
    Another machine is not an option. Again with the fixed resources conundrum. A real db is an option, but I believe if I fix my "pathological" use of BerkeleyDB I won't need to, and would actually lose performance in the bargain. A real relational database can never be as fast as an equally tuned key:value database when used for storing key:value information--the relational layer, the SQL layer, and a bunch of other stuff insures that.

    - concentrate on raw writes within a guaranteed time rather than doing processing while writing, etc.
    This program will already run at an extreme niceness level (the Squid gets what she wants, everybody else waits his turn). The File::Tail module is handling the 'when you've got the time, I'd like to do some work' catching up stuff. This program is doing an important job, but one that has to take a back seat--thus the reason efficiency is important, it can't fall too far behind and still provide reasonable functionality.

    But it seemed like you must have something pathological happening if a 200/sec routine drops below 10/sec.
    I agree. There is something pathological--I believe in the parent insertion routines. It's just a matter of how to fix the pathology without kill the patient.

    Next interesting thought:

    - Throw away all those joins and splits. No more strings, throw away all that "http" and "Inserting parent" etc. Too much Perl going on.
    And replace them with what? The functionality is complete at this point--but is too slow. We can't lose functionality to fix the speed, as a useless program that runs really fast is still a useless program. ;-) I am beginning to suspect that using a more complex database that can handle some of the 'think' for me, rather than doing it all myself in perl might be worth the tradeoff. If I were using a relational DB, the parent->number_of_children could be incremented by accessing that particular entry rather than pulling in and parsing the whole parent structure--behind the scenes it still has complexity, but maybe the MySQL folks know how to handle that complexity much better than me and perl do.

    As I've said in another post this morning, the problem I believe, is boiling down to the parent->child relationship. How do I keep it current, accurate, and make it go fast? The most expensive thing in most add_entry calls is probably the second add_entry call that is made to handle the parent relationship. So if I can make the parent relationship cheap, I can fix the slowdown over time. I think.

    Anyway, thanks for the links and the pointers. I'm going to stick with BerkeleyDB for the time being, and attempt to make my code a little less pathological.

    The first step for making things less pathological is to:
    Add an in-memory hash to speed exsists? checks for the parent.
    Reduce the size of the parent entries, probably converting to a multiple entry system, to prevent changing the size of the parent (a number_of_children field with the kids referenced by $parentmd5:number). Hopefully this is cheaper than the current split/join fun that happens when inserting a child into a parents child list.