I agree reducing the number of seeks you need is vital. Given an average 3 msec seek time you can only have 333 seeks per second. This is of course glacial. Ignoring buffering the original code effectively needed 2 seeks (or more) per line, the improved version required at least 1 seek. In the example I presented the number of seeks required is a function of the number of files we need to create, not the number of lines in the input file. This will be a significant improvement provided that the number of unique files is less than the number of input lines.
<code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>