Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

Comment on

( #3333=superdoc: print w/ replies, xml ) Need Help??

Hi,

I am dealing just about every day with somewhat similar problems on huge data files, and I am fairly confident that it should be possible to read the file only once (or at most twice), but you don't give enough information about the structure of the file.

Is my understanding correct that you first have a bunch of identifier lines (1000+), and then you data lines? And the identifier lines some how give the rules as to what to do with the data lines? Or do you have one identifier line giving information about what to to on the next data line or next data lines?

Please tell us more about the identifiers: do they say on which data line numbers to do something? Or which field to extract in the data line?

In all cases, I believe that it should most probably be possible to read your file sequentially only once, record what you have in the identifier line and use that for processing the data lines coming afterwards. But I can't say more on how to do it without a better idea of your data format or, even better, a simplified sample of your file content together with some explanation on how to use the identifiers to analyze the data lines.


In reply to Re: Reading HUGE file multiple times by Laurent_R
in thread Reading HUGE file multiple times by Anonymous Monk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • Outside of code tags, you may need to use entities for some characters:
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?
    Username:
    Password:

    What's my password?
    Create A New User
    Chatterbox?
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others musing on the Monastery: (13)
    As of 2014-09-30 12:47 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?

      How do you remember the number of days in each month?











      Results (368 votes), past polls