<?xml version="1.0" encoding="windows-1252"?>
<node id="932260" title="Re^3: Store a huge amount of data on disk" created="2011-10-18 20:16:44" updated="2011-10-18 20:16:44">
<type id="11">
note</type>
<author id="171588">
BrowserUk</author>
<data>
<field name="doctext">
&lt;blockquote&gt;&lt;i&gt;&lt;/i&gt;&lt;/blockquote&gt;

&lt;p&gt;Sounds like you're indexing your data by a hex-encoded digest?

&lt;p&gt;Given that you have 3 variable &amp; possible huge sized chunks -- which most RDBMSs handle by writing the filesystem anyway -- associated with each index key, and your selection criteria are both fixed &amp; simple, I'd use the filesystem.

&lt;p&gt;Subdivide the key into chunks that make individual directories contain at most a reasonable number of entries and then store the 3 sections in files at the deepest level. 

&lt;P&gt;By splitting a 32-byte hex digest into 4-char chunks, no directory has more than 256 entries. The file-system cache will cache the lower levels and the upper levels will be both fast to read from disk and quick to search. Especially if your file-system hashes its directory entries.

&lt;p&gt;I'd write the individual chunks of the two text parts in separate files unless they will always be loaded as a single entity, in which case it might be slightly faster to concatenate them.

&lt;P&gt;Overall, given a digest of &lt;c&gt;8fbe7eb8c04c744406cca0aeb67e4f7f&lt;/c&gt;, I'd lay the directory structure out like this:&lt;code&gt;
/data/8fbe/7eb8/c04c/7444/06cc/a0ae/b67e/4f7f/meta.txt
/data/8fbe/7eb8/c04c/7444/06cc/a0ae/b67e/4f7f/text1.000
/data/8fbe/7eb8/c04c/7444/06cc/a0ae/b67e/4f7f/text1.001
/data/8fbe/7eb8/c04c/7444/06cc/a0ae/b67e/4f7f/text1.002
/data/8fbe/7eb8/c04c/7444/06cc/a0ae/b67e/4f7f/text1....

/data/8fbe/7eb8/c04c/7444/06cc/a0ae/b67e/4f7f/text2.000
/data/8fbe/7eb8/c04c/7444/06cc/a0ae/b67e/4f7f/text2.001
/data/8fbe/7eb8/c04c/7444/06cc/a0ae/b67e/4f7f/text2....
&lt;/code&gt;



&lt;div class="pmsig"&gt;&lt;div class="pmsig-171588"&gt;
&lt;hr /&gt;
&lt;font size=1 &gt;
&lt;div&gt;With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'&lt;/div&gt;
&lt;div&gt;Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.&lt;/div&gt;
&lt;div&gt;"Science is about questioning the status quo. Questioning authority". &lt;/div&gt;
&lt;div&gt;In the absence of evidence, opinion is indistinguishable from prejudice.&lt;/div&gt;
&lt;/font&gt;

&lt;/div&gt;&lt;/div&gt;</field>
<field name="root_node">
932175</field>
<field name="parent_node">
932183</field>
</data>
</node>
