<?xml version="1.0" encoding="windows-1252"?>
<node id="1001125" title="Re: Optimizing I/O intensive subroutine" created="2012-10-26 12:43:45" updated="2012-10-26 12:43:45">
<type id="11">
note</type>
<author id="171588">
BrowserUk</author>
<data>
<field name="doctext">
&lt;blockquote&gt;&lt;i&gt;&lt;/i&gt;&lt;/blockquote&gt;
&lt;p&gt;Running your routine on 7 files of 200,000 lines apiece (with limit = 1000), takes just 10.5 seconds; and on 100x 200,000 lines takes 145 seconds on my machine.

&lt;p&gt;Showing (as expected) that the runtime is pretty linear with respect to the number of files. 

&lt;P&gt;Which make your figures (of 40s for 7 and 1500s for 100) suggest that the majority of time is being spent outside of this routine doing something non linear.

&lt;div class="pmsig"&gt;&lt;div class="pmsig-171588"&gt;
&lt;hr /&gt;
&lt;font size=1 &gt;
&lt;div&gt;With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'&lt;/div&gt;
&lt;div&gt;Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.&lt;/div&gt;
&lt;div&gt;"Science is about questioning the status quo. Questioning authority". &lt;/div&gt;
&lt;div&gt;In the absence of evidence, opinion is indistinguishable from prejudice.
&lt;p align=right&gt; [http://thebottomline.cpaaustralia.com.au/|RIP Neil Armstrong]&lt;/p&gt;&lt;/div&gt;
&lt;/font&gt;

&lt;/div&gt;&lt;/div&gt;</field>
<field name="root_node">
1001087</field>
<field name="parent_node">
1001087</field>
</data>
</node>
