Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??
well, if you are working on a 32bit machine, and you read 24bits worth of the md5 hash as your index to the array i get (32 * 2^24) bits of array, unpopulated, or 64MB. assuming you are reading the md5 as an 8bit ASCII string then you get three chars worth of it, which is a bit sad, but most of the 8 bits are redundant for the md5, so if you are lucky you should be able to compress each char to 6bits, via substitution, which gives you a fourth char for the same space. if you can't find four chars that vary enough between md5s (the last four chars?) can you double hash the keys for more variety? SQUID gives you md5, does your end program search by md5? if so you can still store the data by another key, you just have to store the original md5 with the rest of the data to match on.

anyhow, if you run your data structure as a array of lists, with 16 million places in the array to reference, and 1-3 million actual items to index, you are below any load threshhold, so seeks will require on average only one or two compare operations. so long as you can keep the data spread evenly across the array, it should be well worth the memory used. the fact that it would be sparsely populated is of course the goal of what i was suggesting, then you have virtually constant seek time from zero to 3 mil data points.

the total memory cost for this is 64MB for the array itself, again, assuming 32bit pointers, plus the 100-200MB data set you are already dealing with.



on the FETCH/STORE issue, what i meant was: don't build the tree in this program, just feed the data into tables. build the trees in the program that is reading the tables. then you aren't doing a search for parents/children, a store, then another search for the trees, you are just storing the data unprocessed, and doing one search for everything and building the tree then. if you're end user program can be made to build the trees itself you can cut half the searching...

In reply to Re: Re: Re: Performance quandary by Xanatax
in thread Performance quandary by SwellJoe

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others wandering the Monastery: (3)
As of 2024-04-25 20:05 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found