Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

Re: matching datetimestamps and concatenating data where timestamps match from multiple large datafiles

by CountZero (Bishop)
on Sep 22, 2013 at 12:15 UTC ( #1055180=note: print w/ replies, xml ) Need Help??


in reply to matching datetimestamps and concatenating data where timestamps match from multiple large datafiles

You are trying to re-invent a database and query engine.

Keeping all this data in memory will not work unless you have a computer with a huge memory to avoid the repeatedly swapping in and out of your data and even then you still have to write a program to efficiently search through all those arrays.

I would do it as follows:

  1. Read each file into its own database table and index the field with the timestamp. You cannot make the timestamp the key of each record as you seem to have multiple records with the same timestamp in each file. Use a module such as Date::Parse or DateTime::Format::DateParse to turn the timestamp into a standard format that will be the same in all reocrds.
  2. Use standard SQL to match the timestamps in your FILEA-table with the timestamps in the FILEB-, FILEC-, ... tables.

Alternatively but only if all the files have timestamps in strict time-sequential order, you can open a filehandle to each of the files and on a line-by-line basis, loop through FILEA, extract the timestamp, transform the timestamp into a standard format and then iterate through all other files checking their timestamps (after transforming those also into the same standard format) and output the record when the timestamps match until you hit a timestamp past the timestamp of the main loop. Then you do the same for the next FILE until all FILEs have passed the timestamp of the main file and you go to the next record in the main file.

CountZero

A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

My blog: Imperial Deltronics


Comment on Re: matching datetimestamps and concatenating data where timestamps match from multiple large datafiles

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1055180]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others pondering the Monastery: (9)
As of 2014-08-21 20:40 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The best computer themed movie is:











    Results (143 votes), past polls