comment on

Extending on Moritz’s idea a little bit more, another trick is to scan the file stem-to-stern once, noting where the “important pieces” begin and end, and what the “key values” are that you will use when searching for those records. Insert the keys into a hash, with a file-position (or a list of file-positions) as the value. Then, after this one sequential pass through the entire file, you can seek() randomly to those positions at any time thereafter. (If along the way you have noted both the starting-position and the size of the entry, you can “slurp” any particular record into, say, a string variable fairly effortlessly.) This is a useful technique to apply to files that are “loosely” structured, as this one seems to be.

Now, if you happen to know that the two files are sorted, and specifically that they are sorted the same way ... if you can positively assert based on some outside knowledge that this is true, and that this always will be true, with regard to these files ... then your logic becomes a good bit simpler because you can simply read the two files sequentially and do everything in just one forward pass, just as they used to do when the only mass-storage device of any reasonable size that you had at your disposal was a tape-drive. It would be too-messy to sort them yourself, and maybe you do not want to risk that they might be, ahem, “out of sorts,” but it’s a handy trick to use (and, bloody fast ...) when you know that you can.

In reply to Re: Using less memory with BIG files by sundialsvc4
in thread Using less memory with BIG files by jemswira

Are you posting in the right place? Check out Where do I post X? to know for sure.
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
Want more info? How to link or How to display code and escape characters are good places to start.


Perl-Sensitive Sunglasses
	PerlMonks