Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation

Re: Need to efficiently merge 2 FIX protocol log files

by space_monk (Chaplain)
on Jan 09, 2013 at 10:53 UTC ( #1012426=note: print w/replies, xml ) Need Help??

in reply to Need to efficiently merge 2 FIX protocol log files

Okay, lets assume you have enough memory to hold the entire contents of both files in RAM; the simplest is probably to create a hash with the timestamp as a key, and sort on that key.

However, this doesn't take into account the fact that your original files are probably already sorted, so your algorithm should probably be to compare the leading row of each file, and then write the one with the oldest timestamp value, until you run out of data. Obviously such code has to take into account the possibility of identical timestamps, one file being shorter than another .... yada yada yada.

This is roughly how to do it - it is not completely debugged Perl as I'm short on time

sub compare_ts { my ($a, $b) = @_; # this function needs changing to compare timestamp strings. if ($a < $b) { return -1; } elsif ($a == $b) { return 0;} elsif ($a > $b) { return 1; } } my @els; my $tsA, $tsB; # open the files... open my $fa, "<fileA" or die something; open my $fb, "<fileB" or die something; # slurp.... my @aFIX = <$fa>; my @bFIX = <$fb>; # prime each compare value my $rowA = shift @aFIX; my $rowB = shift @bFIX; # keep comparing till one or the other runs out of data while (defined $rowA and defined $rowB) { # get timestamps from rows, could use regex: m/10=\d\d\d\x01(\w+) # but split on SOH and using last element probably does the job @els = split( /\x01/, $rowA); $tsA = pop @els; @els = split( /\x01/, $rowB); $tsB = pop @els; if (compare_ts( $tsA, $tsB) < 0) { print $rowA; $rowA = shift @aFIX; } else { print $rowB; $rowB = shift @bFIX; } } # we've run out of data in fileA or B, so can dump the rest if (defined $rowA) { print $rowA; print @aFIX; } if (defined $rowB) { print rowB; print @bFIX; }
A Monk aims to give answers to those who have none, and to learn from those who know more.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1012426]
[shmem]: welcome back, tye

How do I use this? | Other CB clients
Other Users?
Others lurking in the Monastery: (6)
As of 2017-04-28 13:51 GMT
Find Nodes?
    Voting Booth?
    I'm a fool:

    Results (522 votes). Check out past polls.