Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

Comment on

( #3333=superdoc: print w/ replies, xml ) Need Help??

Okay, lets assume you have enough memory to hold the entire contents of both files in RAM; the simplest is probably to create a hash with the timestamp as a key, and sort on that key.

However, this doesn't take into account the fact that your original files are probably already sorted, so your algorithm should probably be to compare the leading row of each file, and then write the one with the oldest timestamp value, until you run out of data. Obviously such code has to take into account the possibility of identical timestamps, one file being shorter than another .... yada yada yada.

This is roughly how to do it - it is not completely debugged Perl as I'm short on time

sub compare_ts { my ($a, $b) = @_; # this function needs changing to compare timestamp strings. if ($a < $b) { return -1; } elsif ($a == $b) { return 0;} elsif ($a > $b) { return 1; } } my @els; my $tsA, $tsB; # open the files... open my $fa, "<fileA" or die something; open my $fb, "<fileB" or die something; # slurp.... my @aFIX = <$fa>; my @bFIX = <$fb>; # prime each compare value my $rowA = shift @aFIX; my $rowB = shift @bFIX; # keep comparing till one or the other runs out of data while (defined $rowA and defined $rowB) { # get timestamps from rows, could use regex: m/10=\d\d\d\x01(\w+) # but split on SOH and using last element probably does the job @els = split( /\x01/, $rowA); $tsA = pop @els; @els = split( /\x01/, $rowB); $tsB = pop @els; if (compare_ts( $tsA, $tsB) < 0) { print $rowA; $rowA = shift @aFIX; } else { print $rowB; $rowB = shift @bFIX; } } # we've run out of data in fileA or B, so can dump the rest if (defined $rowA) { print $rowA; print @aFIX; } if (defined $rowB) { print rowB; print @bFIX; }
A Monk aims to give answers to those who have none, and to learn from those who know more.

In reply to Re: Need to efficiently merge 2 FIX protocol log files by space_monk
in thread Need to efficiently merge 2 FIX protocol log files by softwareCEO

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • Outside of code tags, you may need to use entities for some characters:
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?
    Username:
    Password:

    What's my password?
    Create A New User
    Chatterbox?
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others avoiding work at the Monastery: (12)
    As of 2014-08-22 17:59 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?

      The best computer themed movie is:











      Results (163 votes), past polls