|Just another Perl shrine|
Re^2: Displaying/buffering huge text filesby Rudif (Hermit)
|on Mar 28, 2005 at 19:39 UTC||Need Help??|
Your indexing program is very neat. I never realized that it can be so simple and effective (in perl). Thanks!
I tried it on some large files (50 to 500 MB, with average line length about 150 characters).
I quickly spotted speed a problem with
... it has to copy all previously packed data in every .= operation, so that the time grows with the square of number of lines. A big oh, O(x^2) to be precise.
Here is my (almost) drop-in replacement which trades memory space for indexing time
... and the timing that shows roughly O(x) times for mine, and O(x^2) for yours (you can see the parabola in the table, if you look at it sideways).
In the last test case (the 527 MB file) with my script version the process memory usage peaked at +270 MB for a final index size of 27.5 MB.
I also added pop @index;, to get rid of the last index - it points to the end of the file, after the last line.