Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

Re: Read Some lines in Tera byte file

by sundialsvc4 (Abbot)
on Oct 13, 2010 at 11:40 UTC ( #865061=note: print w/ replies, xml ) Need Help??


in reply to Read Some lines in Tera byte file

An approximate index, (e.g. the position of every thousandth line of data) is probably a very reasonable approach to use here.   (SQLite is amazingly useful for such things.)   You really only have to get the computer “into the general neighborhod,” because when it does the disk-seek it’s going to bring in several sectors’ worth of data.

Another very useful technique, if you can manage it, is to first sort your update (or search) keys into the same order as the file itself.   Now, you can move through the data one time, perhaps sequentially.   Whatever updates or changes you need to make to any particular region of the file, you will be able to do “all at once, and then move on.”

These strategies were, of course, absolutely necessary when the only “mass” storage device we possessed were digital reel-to-reel tapes that stored a few hundred bytes per inch, but they are still very-surprisingly apropos to this day.   Although we have high-density disks that rotate at thousands of RPMs, many of our “ruling constraints” when dealing with large data sets are still physical ones.   “Seek time,” and “rotational latency.”

Or, in this case ... network time and bandwidth!   Is it possible, for instance, to do this work on the server computer directly?   When dealing with a huge network-based file, you really, really want to do that... because otherwise, every one of those trillions of bytes are going to be transmitted down the pipe between the two computers.   Z-z-z-z-z-z-z....


Comment on Re: Read Some lines in Tera byte file

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://865061]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others taking refuge in the Monastery: (10)
As of 2014-12-19 08:53 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (75 votes), past polls