<?xml version="1.0" encoding="windows-1252"?>
<node id="555116" title="Re: Perl Code Runs Extremely Slow" created="2006-06-13 16:06:37" updated="2006-06-13 12:06:37">
<type id="11">
note</type>
<author id="159343">
samtregar</author>
<data>
<field name="doctext">
There are so many performance problems in this code that it's  kind of hard to know where to begin!  Here's a few that jump out right away:
&lt;p&gt;
&lt;ul&gt;
&lt;li&gt;Don't open file 2 for each line of file 1 and read through every line!  If there are 1 million lines in file 1 and 500 thousand lines in file 2 then you'll read 500 &lt;b&gt;billion&lt;/b&gt; lines from file 2!  Instead read file 2 once and re-use the hash for each lookup.&lt;/li&gt;
&lt;li&gt;Don't re-sort all the keys from file 1 everytime you read a line from file 1. (&lt;b&gt;UPDATE:&lt;/b&gt; Looking again I see that %fets is actually local to the while().  Why are you using a hash at all here?  Why are you calling sort() when only one key is present?)&lt;/li&gt;
&lt;li&gt;You may not have enough memory to actually hold all of file 2 in memory at once.  If you don't you'll run into swap, which will be slow no matter what you do.  You can fix this by storing the hash in a database file via [cpan://DB_File] or something similar.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;
&lt;p&gt;
-sam
</field>
<field name="root_node">
555114</field>
<field name="parent_node">
555114</field>
</data>
</node>
