Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked

Re: Displaying/buffering huge text files

by crenz (Priest)
on Feb 23, 2005 at 14:51 UTC ( #433704=note: print w/replies, xml ) Need Help??

in reply to Displaying/buffering huge text files

This is an excellent question with excellent answers. May I suggest you put the whole thing into a module and on CPAN? It might be useful for many others as well.

I like the idea of only indexing every 10th or 25th line, then skipping on read. Most OSes will read a whole block at a time anyway, so for most files, you will be reading a lot of lines from the hard disk at the same time anyway. Might well make use of them. Of course, if it's in a module, the skipping could even be handled transparently (and customized by setting a parameter, and the user could just do a $file->GetLine(100_000) without worrying about what's going on.

One more idea: You could only read and index $n lines initially, then provide a callback routine that can be called regularly to read and index $m lines more, until the file is fully indexed. This way, a text editor can display the first few lines very quickly, then continue indexing in the background by calling your callback routine in a separate thread or in the main thread's GUI loop.

Replies are listed 'Best First'.
Re^2: Displaying/buffering huge text files
by nothingmuch (Priest) on Mar 01, 2005 at 12:49 UTC
    In fact, you can stat the file and get back the system's preferred size of blocks, and then make your index be a mapping of line range to block number, and seek to block * block_size, and then skip lines from there.

    Going through even 500 lines is nearly instantaneous, probably less than a screen redraw, and i guess that's about as much as you can expect to fit in a 4k block, which is pretty much the standard.

    The advantage is that you will probably (if you're careful about off-by-one) can minimize the disk access so cleanly, that after several seeks the relevant items will all be in memorized pages, and subsequent reads will be cheap.

    As for implementation, Event has good IO handling, idle callbacks, and Tk integration - it could be used to index the file incrementally, without threading, if that's scary.

    Update: I suddenly remember a snippet in some perlfaq using tr to count the number of files efficiently. It uses 4k increments. This should make indexing very quick... keep a sum, and just increment the sum for every block, and record the intermediate results somewhere.

    zz zZ Z Z #!perl

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://433704]
[LanX]: with different specialized employees in different rooms
[LanX]: communication overhead calls for sensible management
[ambrus]: "different specialized employees in different rooms" => so, there's a cleaner guy who cleans up the vomit from the bathroom every morning?
[erix]: makes me think of MMM too (mythical man month): "adding more people to a project makes it slower"
[erix]: that should be "to an already-too-late- project" I think
[ambrus]: oh yes, we have a pretty good example of that when they reorganized the grill place in the restaurant nearby.
[ambrus]: it now has two more people working there, and they're serving slower and more expensive.
[ambrus]: It used to be just a master cook who takes the order and puts the meat and eggs on the open grill plate table, and an assistant who removes them to a plate, adds the side dish, and gives the plate to the people in the queue, plus a cashier.
[ambrus]: Now it has five people instead of three, some sort of call number ticket system where people wait a lot for their food to get ready (it's the same kinds of grilled meat and fish on the same equipment, it won't actually fry slower),

How do I use this? | Other CB clients
Other Users?
Others lurking in the Monastery: (13)
As of 2017-09-22 13:49 GMT
Find Nodes?
    Voting Booth?
    During the recent solar eclipse, I:

    Results (264 votes). Check out past polls.