Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw

Comment on

( #3333=superdoc: print w/ replies, xml ) Need Help??
You are trying to re-invent a database and query engine.

Keeping all this data in memory will not work unless you have a computer with a huge memory to avoid the repeatedly swapping in and out of your data and even then you still have to write a program to efficiently search through all those arrays.

I would do it as follows:

  1. Read each file into its own database table and index the field with the timestamp. You cannot make the timestamp the key of each record as you seem to have multiple records with the same timestamp in each file. Use a module such as Date::Parse or DateTime::Format::DateParse to turn the timestamp into a standard format that will be the same in all reocrds.
  2. Use standard SQL to match the timestamps in your FILEA-table with the timestamps in the FILEB-, FILEC-, ... tables.

Alternatively but only if all the files have timestamps in strict time-sequential order, you can open a filehandle to each of the files and on a line-by-line basis, loop through FILEA, extract the timestamp, transform the timestamp into a standard format and then iterate through all other files checking their timestamps (after transforming those also into the same standard format) and output the record when the timestamps match until you hit a timestamp past the timestamp of the main loop. Then you do the same for the next FILE until all FILEs have passed the timestamp of the main file and you go to the next record in the main file.


A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

My blog: Imperial Deltronics

In reply to Re: matching datetimestamps and concatenating data where timestamps match from multiple large datafiles by CountZero
in thread matching datetimestamps and concatenating data where timestamps match from multiple large datafiles by Cosmic37

Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":

  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?

    What's my password?
    Create A New User
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others chanting in the Monastery: (8)
    As of 2015-11-26 09:24 GMT
    Find Nodes?
      Voting Booth?

      What would be the most significant thing to happen if a rope (or wire) tied the Earth and the Moon together?

      Results (696 votes), past polls