Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery
 
PerlMonks  

comment on

( #3333=superdoc: print w/replies, xml ) Need Help??
I may have missed something here and therefore the following approach might be oversimplified.

I'd write the data to a flat file in the first pass, with the file structure being lines with key-value-pairs. The key would represent the "filename" and the value one of the "4-byte" values of the OP. Make sure a new file is started before the max filelength for the OS or the FS is reached.

If the "filename" is too long, I would create a separate file mapping each "filename" to a shorter key. Obviously each key will occur as many times as there are values for it, each time on a separate line. The order of the values (should they matter) will be preserved in the order of the lines.

In the second pass, once all the values have been written to the file(set), analyze it once for each key and write all of the values per key into a single arbitrary-length record of a new target file(set). In the third pass, create the index on the target file(set).

In this way the first pass file(set) will accept values for keys in any order, appending them to the end of the file and will not waste space for large records that won't be needed most of the time. The second pass will take a whole lot of time, but as I understand it time is not the issue here.

Generally, if space is a major consideration, a DBMS is the last thing I would look at. There's just too much overhead there, in order to make it work with all kinds of data structures.

Update: Corrected spelling mistake. Added Comment on DB.


In reply to Re^2: Combining Ultra-Dynamic Files to Avoid Clustering (Ideas?) by mhi
in thread Combining Ultra-Dynamic Files to Avoid Clustering (Ideas?) by rjahrman

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others about the Monastery: (4)
As of 2021-11-30 23:55 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?