Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

Comment on

( #3333=superdoc: print w/ replies, xml ) Need Help??
You are trying to re-invent a database and query engine.

Keeping all this data in memory will not work unless you have a computer with a huge memory to avoid the repeatedly swapping in and out of your data and even then you still have to write a program to efficiently search through all those arrays.

I would do it as follows:

  1. Read each file into its own database table and index the field with the timestamp. You cannot make the timestamp the key of each record as you seem to have multiple records with the same timestamp in each file. Use a module such as Date::Parse or DateTime::Format::DateParse to turn the timestamp into a standard format that will be the same in all reocrds.
  2. Use standard SQL to match the timestamps in your FILEA-table with the timestamps in the FILEB-, FILEC-, ... tables.

Alternatively but only if all the files have timestamps in strict time-sequential order, you can open a filehandle to each of the files and on a line-by-line basis, loop through FILEA, extract the timestamp, transform the timestamp into a standard format and then iterate through all other files checking their timestamps (after transforming those also into the same standard format) and output the record when the timestamps match until you hit a timestamp past the timestamp of the main loop. Then you do the same for the next FILE until all FILEs have passed the timestamp of the main file and you go to the next record in the main file.

CountZero

A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

My blog: Imperial Deltronics

In reply to Re: matching datetimestamps and concatenating data where timestamps match from multiple large datafiles by CountZero
in thread matching datetimestamps and concatenating data where timestamps match from multiple large datafiles by Cosmic37

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • Outside of code tags, you may need to use entities for some characters:
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?
    Username:
    Password:

    What's my password?
    Create A New User
    Chatterbox?
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others lurking in the Monastery: (8)
    As of 2014-12-27 17:12 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?

      Is guessing a good strategy for surviving in the IT business?





      Results (177 votes), past polls