Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw

Comment on

( #3333=superdoc: print w/replies, xml ) Need Help??
Interesting thoughts Xanatax,

From an academic standpoint (since I'm writing this program to learn as much as to solve a problem, that definitely makes the academic valuable to me), I like the idea of playing with different data structures--even hugely different, such as an array in memory, indexed by part of the MD5. There are at least two problems I can see with this approach (I'm not wholly illiterate on data structures and indexing, and other fun things--I am a Squid nerd, after all), and I'd welcome thoughts or pointers to docs on solving them:

We can't have 32 bytes worth of array entries, so we'd index on a part of the MD5. Even so, to guarantee uniqueness we need at least a few bytes--which leads to an extremely huge, but very sparsely populated array. Not ideal, even if memory is not an issue--I think memory would become an issue if we're using an array /that/ big. So how to handle sparse data structures of this sort?

Persistence and concurrency. We have to be able to read the data from another process (the indexing itself is not the end, it is merely the means). So how to dump this out, in near realtime, in a form that is quickly accessible to another process? We don't need to modify the data from the other process, just read it.

As for the no-FETCH, STORE everything proposal...I don't get it. I think it is missing the vital parent->child relationship component, if I understand you. I have considered deleting the parent each time and rewriting with the new child, but I suspect that is an optimization that the BDB handles already (and probably better than I can in Perl--since it is probably smart enough to know when it can insert into the existing 4kb space and when it will have to expand the space).

In reply to Re: Re: Performance quandary by SwellJoe
in thread Performance quandary by SwellJoe

Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":

  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?

    What's my password?
    Create A New User
    and all is quiet...

    How do I use this? | Other CB clients
    Other Users?
    Others imbibing at the Monastery: (8)
    As of 2018-06-21 19:48 GMT
    Find Nodes?
      Voting Booth?
      Should cpanminus be part of the standard Perl release?

      Results (119 votes). Check out past polls.