I am dealing just about every day with somewhat similar problems on huge data files, and I am fairly confident that it should be possible to read the file only once (or at most twice), but you don't give enough information about the structure of the file.
Is my understanding correct that you first have a bunch of identifier lines (1000+), and then you data lines? And the identifier lines some how give the rules as to what to do with the data lines? Or do you have one identifier line giving information about what to to on the next data line or next data lines?
Please tell us more about the identifiers: do they say on which data line numbers to do something? Or which field to extract in the data line?
In all cases, I believe that it should most probably be possible to read your file sequentially only once, record what you have in the identifier line and use that for processing the data lines coming afterwards. But I can't say more on how to do it without a better idea of your data format or, even better, a simplified sample of your file content together with some explanation on how to use the identifiers to analyze the data lines.
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
Outside of code tags, you may need to use entities for some characters:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.
| & || & |
| < || < |
| > || > |
| [ || [ |
| ] || ] ||