Beefy Boxes and Bandwidth Generously Provided by pair Networks Ovid
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

Comment on

( #3333=superdoc: print w/ replies, xml ) Need Help??
I won't comment on the Berkley DB optimisations, others have commented enough on this that i suspect you'll reach your 15/sec mark without issue.

However it is important to note that the greatest performance advances are always aquired by looking at the whole problem, rather than any particular small part of it.

In this case, optimising the Berkley DB is a lot of effort expended on what is really not the problem. A Berkley DB is designed to store data against a string-based key of unknown randomness, and unknown size.

In your favor, you have an excellent knowledge of the operating conditions, how many entries you need etc. You also have an excellent key algorithm for free, md5 has excellent spread and can be used as a hashing entity itself, rather than forcing the Berkley db to re-hash the hash as it were.

Thus my suggestion, if you really require performance, is not to re-write in C/C++, which will not resolve the problem, although it might get you the 15hits/sec you're looking for. I would instead re-write my database backend, in perl, to use something like a fixed-size disk file indexed by the first n characters of the md5 (whatever suits), with start-time configurable parameters for the number of buckets per entry and overflow so that you can tweak those. Then just seek into the thing using the hash and get the data you need.

Yes, its more work, specialisation always is, but it will give you the fastest performance you're likely to get without buying better hardware (on that note, just getting some faster hardware is always a good option :). If you've got the time, do some reading on high-performance data structures and see what you can find. Remember to boost kernel and on-disk caches for added performacne, remember to dump out lots of stats on disk/cache hits etc so you can work out whether you need to implement an intermediate in-memory application-specific cache, or use a different portion of the md5 string because its better distributed (unlikely :)

All these things will buy you serious performance boosts, not little fraction-of-a-second increments, at the cost of really having to understand what you are trying to achieve.

Moving to different languages etc, except in rare circumstances where a given language is actually a big part of the optimal solution, will never replace designing to perform.


In reply to Re: Performance quandary by Anonymous Monk
in thread Performance quandary by SwellJoe

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • Outside of code tags, you may need to use entities for some characters:
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?
    Username:
    Password:

    What's my password?
    Create A New User
    Chatterbox?
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others surveying the Monastery: (7)
    As of 2014-04-16 05:28 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?

      April first is:







      Results (414 votes), past polls