Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??
That MySQL comment bothers me.

With a properly designed BTree, the access time should scale logarithmically with the size of the table. The n*log(n) should be the overall time to build up the whole table (n operations, each of which is log(n)).

Typical data sets range between, say, 1000 and 10**9 rows. Over that range log(n) only changes by a factor of 3, so for all intents and purposes it is just some constant. In real databases the access time is dominated by the time to seek to disk, and so (as my link in my previous post pointed out) the locality of reference of a BTREE can be a good win over hashing because it is more likely to hit pages in cache. Sure, asymptotically that log(n) is infinitely worse than any possible constant, but real life doesn't tend to reach that asymptotic nightmare. (Besides which BTREEs have a good worst-case guaranteed performance. Hashes don't.)

On to other issues.

Using a RAM disk does a lot to remove filesystem overhead. That should, among other things, catch most of the possible speed wins from extracting parallelism in commits (through insert buffers and other tricks). Come to think of it, most of the rest of the filesystem overhead can be removed just by naming your file undef, which tells Berkeley DB that it is dealing with an in memory database, removing the OS's filesystem abstraction layers from the picture.

You do have a good point about doing too much Perl in the processs. Personally I strongly suspect that just changing to a better data structure will win the day. Alternately, if you can, using native hashes is better than pulling in Berkeley DB. But given Perl's memory overhead, I would be concerned about hitting the 32-bit addressing barrier, even if you have enough RAM in the machine to do it. (A wall we will be hearing about more and more often as time goes by...)


In reply to Re (tilly) 2: Performance quandary by tilly
in thread Performance quandary by SwellJoe

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others exploiting the Monastery: (2)
As of 2024-04-24 17:21 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found