An approximate index, (e.g. the position of every thousandth line of data) is probably a very reasonable approach to use here. (SQLite is amazingly useful for such things.) You really only have to get the computer “into the general neighborhod,” because when it does the disk-seek it’s going to bring in several sectors’ worth of data.
Another very useful technique, if you can manage it, is to first sort your update (or search) keys into the same order as the file itself. Now, you can move through the data one time, perhaps sequentially. Whatever updates or changes you need to make to any particular region of the file, you will be able to do “all at once, and then move on.”
These strategies were, of course, absolutely necessary when the only “mass” storage device we possessed were digital reel-to-reel tapes that stored a few hundred bytes per inch, but they are still very-surprisingly apropos to this day. Although we have high-density disks that rotate at thousands of RPMs, many of our “ruling constraints” when dealing with large data sets are still physical ones. “Seek time,” and “rotational latency.”
Or, in this case ... network time and bandwidth! Is it possible, for instance, to do this work on the server computer directly? When dealing with a huge network-based file, you really, really want to do that... because otherwise, every one of those trillions of bytes are going to be transmitted down the pipe between the two computers. Z-z-z-z-z-z-z....
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.
| & || & |
| < || < |
| > || > |
| [ || [ |
| ] || ] ||