Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

Re: Displaying/buffering huge text files

by spurperl (Priest)
on Feb 24, 2005 at 06:35 UTC ( #433966=note: print w/ replies, xml ) Need Help??


in reply to Displaying/buffering huge text files

The idea of storing every n-th index in the index table that a couple of monks brought up works even better than expected. I wasn't sure about it at first, but since my reading of the file is buffered, it emposes no performance cost.

My approach now is: I have a BUFFER_BLOCK, currently 1000 lines long. I store every BUFFER_BLOCK lines in the index, that is, 0th line, 1000th line, etc. When the class is asked for a line which is not in the buffer, it rounds the line to the highest BUFFER_BLOCK (I.e. for line 7781 it goes to 7000), grabs another BUFFER_BLOCK lines down an another up (that is 6000-9000 for line 7781) and returns the desired line.

This works like magic, and blazingly fast. I'm experimenting with a 100MB file now (~6 million lines). Reading and indexing it (I'm doing it now in C++ and the smaller amount of push_back to the vector gives gains) takes below 2 seconds ! Afterwards, accesses to lines that are not in buffer take ~70 ms (in buffer is immediate of course).

Memory consumption: the index table takes 4*1/BUFFER_BLOCK bytes for each line. That is, in the gigantic file I'm testing, it takes only 24 KB.

The buffer itself is 3000 lines at 30 chars / line on average, only 90 KB or so.

So, the class "mirrors" a 100 MB file, taking only about 120 KB of memory and working blazingly fast.

Thanks for all the good and interesting answers, monks. I wonder, though, if Perl can match C++'s speed here. Indexing a 100 MB file in 1.7 seconds is quite impressive.


Comment on Re: Displaying/buffering huge text files

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://433966]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others perusing the Monastery: (12)
As of 2015-07-28 07:29 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (253 votes), past polls