Beefy Boxes and Bandwidth Generously Provided by pair Networks Frank
laziness, impatience, and hubris
 
PerlMonks  

Comment on

( #3333=superdoc: print w/ replies, xml ) Need Help??

Dear Brethren, I have been trying to think of a better way to do this but I am afraid I am still too ignorant of perl. I love perl when I use it but I am an intermittent user so please forgive me, etc. I have time stamped data with date and time somewhere within each line of interesting data in potentially rather huge data files of many Gigabytes with millions of lines. I am testing with smaller files of less than 100000 lines.

In FILEA the data is logged less frequently than once per second (actually more like twice per minute but could vary). The other files (FILEB, FILEC, FILED) have data lines logged at once per second or even greater but there is always possibility of missing data. In principle all data files overlap in logging times for my analysis period. I want to append each line of data of FILEA with data from the all other files which has the same time stamp (if there are more than one line with same timestamp I don't mind taking the first or last one as long as the sample has same timestamp, ie logged at the same second).

The question is whether anyone can recommend a good method to do this. My simplistic thinking is that I could read all files into respective arrays and trawl through the arrays element by element matching timestamps and concatenating data in an output array before saving that to an output file. However, I feel sure {despite not actually being sure!} that there is a cleverer and quicker way to do this task. Can someone please enlighten me?

A minor complication is that timestamping format may differ per file. eg in FILEA, FILEB and FILEC the date and time components could be matched with m/^\d\.+(\d{2}):(\d{2}):(\d{2})\.+(\d{2})(\d{2})(\d{2})\d{2}.abc/ where $hours=$1; $min=$2; $sec=$3; $daym=$4; $mon=$5-1; $year=100+$6;

Whereas in FILED the date and time components could be matched with m/^\"(\d{4})-(\d{2})-(\d{2})\s+(\d{2}):(\d{2}):(\d{2})\.+/ where $hours=$4; $min=$5; $sec=$6; $daym=$3;$mon=$2-1;$year=$1-1900;

Sorry in advance!


In reply to matching datetimestamps and concatenating data where timestamps match from multiple large datafiles by Cosmic37

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • Outside of code tags, you may need to use entities for some characters:
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?
    Username:
    Password:

    What's my password?
    Create A New User
    Chatterbox?
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others imbibing at the Monastery: (4)
    As of 2014-04-20 05:10 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?

      April first is:







      Results (485 votes), past polls