<?xml version="1.0" encoding="windows-1252"?>
<node id="433593" title="Displaying/buffering huge text files" created="2005-02-23 01:19:45" updated="2005-08-15 14:12:00">
<type id="115">
perlquestion</type>
<author id="186172">
spurperl</author>
<data>
<field name="doctext">
Fellow monks,
&lt;p&gt;
My GUI application contains multiple windows. One of them displays a text file - read only. The user can scroll the file, selects portions of it, run "Find" on it, simple stuff.&lt;p&gt;
The problems begin when this file is very big - 100s of MBs. The application natually tries to load it wholly into memory, and BOOM.&lt;p&gt;
Web searches brought up surprisingly little results. Even most of the popular text editors ignore this problem and just collapse on files too big.&lt;p&gt;
But I know it's possible, because some editors do it, and it sounds possible in theory.&lt;p&gt;
&lt;readmore&gt;
The problem with text files is that we can't seek in them freely. In binary files it's possible, in text files not.&lt;p&gt;
I can display a single page to the user (perhaps padded by a buffer from above and below) - as far as he is concerned all the rest is virtual. But there's a problem - say a user drags a scroll bar to some far away location in the file - line 999999, for example. How do I get there quickly ?
&lt;p&gt;
One solution is just read the file line by line until 999999. This is slow.&lt;p&gt;
Another solution: when the file is initially opened, I read it and create an index table: line -&gt; byte. Say, line 225 starts at byte 1069 in the file. Then I can immediately go to the desired line by seeking in the file.&lt;p&gt;
There's a problem: 1 million lines =&gt; about 8 million bytes to store the index. Still, quite a lot of memory. (There can also be 10 mln lines, as far as I'm concerned).&lt;p&gt;
So, I can keep this index in a separate, binary file. When the user asks line 999999, I go to my index binary file, quickly seek to record 999999, read the byte start in the text file and jump there.&lt;p&gt;
Does this sound logical ? Can you think of simpler solutions ?&lt;p&gt;
Thanks in advance</field>
</data>
</node>
