comment on

Dear Brethren, I have been trying to think of a better way to do this but I am afraid I am still too ignorant of perl. I love perl when I use it but I am an intermittent user so please forgive me, etc. I have time stamped data with date and time somewhere within each line of interesting data in potentially rather huge data files of many Gigabytes with millions of lines. I am testing with smaller files of less than 100000 lines.

In FILEA the data is logged less frequently than once per second (actually more like twice per minute but could vary). The other files (FILEB, FILEC, FILED) have data lines logged at once per second or even greater but there is always possibility of missing data. In principle all data files overlap in logging times for my analysis period. I want to append each line of data of FILEA with data from the all other files which has the same time stamp (if there are more than one line with same timestamp I don't mind taking the first or last one as long as the sample has same timestamp, ie logged at the same second).

The question is whether anyone can recommend a good method to do this. My simplistic thinking is that I could read all files into respective arrays and trawl through the arrays element by element matching timestamps and concatenating data in an output array before saving that to an output file. However, I feel sure {despite not actually being sure!} that there is a cleverer and quicker way to do this task. Can someone please enlighten me?

A minor complication is that timestamping format may differ per file. eg in FILEA, FILEB and FILEC the date and time components could be matched with m/^\d\.+(\d{2}):(\d{2}):(\d{2})\.+(\d{2})(\d{2})(\d{2})\d{2}.abc/ where $hours=$1; $min=$2; $sec=$3; $daym=$4; $mon=$5-1; $year=100+$6;

Whereas in FILED the date and time components could be matched with m/^\"(\d{4})-(\d{2})-(\d{2})\s+(\d{2}):(\d{2}):(\d{2})\.+/ where $hours=$4; $min=$5; $sec=$6; $daym=$3;$mon=$2-1;$year=$1-1900;

Sorry in advance!

In reply to matching datetimestamps and concatenating data where timestamps match from multiple large datafiles by Cosmic37

Are you posting in the right place? Check out Where do I post X? to know for sure.
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
Want more info? How to link or How to display code and escape characters are good places to start.