in reply to Re: Displaying/buffering huge text files
in thread Displaying/buffering huge text files

Hi BrowserUk

Your indexing program is very neat. I never realized that it can be so simple and effective (in perl). Thanks!

I tried it on some large files (50 to 500 MB, with average line length about 150 characters).

I quickly spotted speed a problem with

$index .= pack 'd', tell FILE while <FILE>;
... it has to copy all previously packed data in every .= operation, so that the time grows with the square of number of lines. A big oh, O(x^2) to be precise.

Here is my (almost) drop-in replacement which trades memory space for indexing time

my @index = ( pack 'd', 0 ); push @index, pack 'd', tell FILE while <FILE>; pop @index; my $index = join '', @index;
... and the timing that shows roughly O(x) times for mine, and O(x^2) for yours (you can see the parabola in the table, if you look at it sideways).

In the last test case (the 527 MB file) with my script version the process memory usage peaked at +270 MB for a final index size of 27.5 MB.

Indexing mine : time 1 s, size 19949431, lines 136126 Indexing mine : time 2 s, size 40308893, lines 258457 Indexing mine : time 5 s, size 95227350, lines 634392 Indexing mine : time 29 s, size 527423877, lines 3441911 Indexing yours: time 2 s, size 19949431, lines 136127 Indexing yours: time 6 s, size 40308893, lines 258458 Indexing yours: time 31 s, size 95227350, lines 634393 Indexing yours: time 809 s, size 527423877, lines 3441912
I also added pop @index;, to get rid of the last index - it points to the end of the file, after the last line.