Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling

Comment on

( #3333=superdoc: print w/replies, xml ) Need Help??

Dear Brethren, I have been trying to think of a better way to do this but I am afraid I am still too ignorant of perl. I love perl when I use it but I am an intermittent user so please forgive me, etc. I have time stamped data with date and time somewhere within each line of interesting data in potentially rather huge data files of many Gigabytes with millions of lines. I am testing with smaller files of less than 100000 lines.

In FILEA the data is logged less frequently than once per second (actually more like twice per minute but could vary). The other files (FILEB, FILEC, FILED) have data lines logged at once per second or even greater but there is always possibility of missing data. In principle all data files overlap in logging times for my analysis period. I want to append each line of data of FILEA with data from the all other files which has the same time stamp (if there are more than one line with same timestamp I don't mind taking the first or last one as long as the sample has same timestamp, ie logged at the same second).

The question is whether anyone can recommend a good method to do this. My simplistic thinking is that I could read all files into respective arrays and trawl through the arrays element by element matching timestamps and concatenating data in an output array before saving that to an output file. However, I feel sure {despite not actually being sure!} that there is a cleverer and quicker way to do this task. Can someone please enlighten me?

A minor complication is that timestamping format may differ per file. eg in FILEA, FILEB and FILEC the date and time components could be matched with m/^\d\.+(\d{2}):(\d{2}):(\d{2})\.+(\d{2})(\d{2})(\d{2})\d{2}.abc/ where $hours=$1; $min=$2; $sec=$3; $daym=$4; $mon=$5-1; $year=100+$6;

Whereas in FILED the date and time components could be matched with m/^\"(\d{4})-(\d{2})-(\d{2})\s+(\d{2}):(\d{2}):(\d{2})\.+/ where $hours=$4; $min=$5; $sec=$6; $daym=$3;$mon=$2-1;$year=$1-1900;

Sorry in advance!

In reply to matching datetimestamps and concatenating data where timestamps match from multiple large datafiles by Cosmic37

Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":

  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?

    What's my password?
    Create A New User
    and all is quiet...

    How do I use this? | Other CB clients
    Other Users?
    Others chanting in the Monastery: (4)
    As of 2018-05-26 20:26 GMT
    Find Nodes?
      Voting Booth?