<?xml version="1.0" encoding="windows-1252"?>
<node id="1006783" title="Re^2: &quot;Out of memory&quot; problem" created="2012-12-02 23:50:16" updated="2012-12-02 23:50:16">
<type id="11">
note</type>
<author id="647953">
sundialsvc4</author>
<data>
<field name="doctext">
&lt;p&gt;
Agree with [BrowserUK] ... 500 &lt;i&gt;million&lt;/i&gt; integers is a lot to index, and if you aren&amp;rsquo;t &lt;em&gt;searching&lt;/em&gt; for anything, it&amp;rsquo;s pure overhead to get &amp;ldquo;sorted answers&amp;rdquo; that way. &amp;nbsp; But a good external sorting package would have no particular difficulty.
&lt;/p&gt;&lt;p&gt;
Ideally, you would arrange the whole data-processing flow which includes this file so that everything gets put into a known sort-sequence early and things are done in such a way as to &lt;em&gt;keep&lt;/em&gt; it that way from one step to the next. &amp;nbsp; So you might have a 500 million record master-file which is simply &amp;ldquo;known to be&amp;rdquo; sorted, and you manipulate that file in ways that require it to be that way and which keep it that way. &amp;nbsp; This avoids searching, and it avoids repetitive sorting. &amp;nbsp; It also avoids indexes and the overhead of the same. &amp;nbsp; At the same time, though, you do &lt;em&gt;not&lt;/em&gt; want to schleb a bunch of data through disk-reads and disk-writes if you are not actually doing anything with most of it.
&lt;/p&gt;&lt;p&gt;
Obviously, RAM is the fastest resource and it avoids I/O entirely ... provided that virtual-memory swapping is not going on, which can be killer. &amp;nbsp; Your strategy entirely depends on your situation, and sometimes you can get a lot of mileage simply by chopping a large file into smaller chunks so that each one does fit in the RAM that you have without swapping.
&lt;/p&gt;&lt;p&gt;
The key here is ... &amp;ldquo;without swapping.&amp;rdquo; &amp;nbsp; If you are doing high volume processing &amp;ldquo;in memory&amp;rdquo; to avoid I/O, but push the limit so that you start to swap, not only &lt;em&gt;is&lt;/em&gt; &amp;ldquo;I/O going on,&amp;rdquo; but it can be of a particularly murderous kind.
&lt;/p&gt;</field>
<field name="root_node">
1006521</field>
<field name="parent_node">
1006527</field>
</data>
</node>
