I have two files containing thousands, to potentially millions of lines of comma separated records. Each line is a stock market symbol, followed by its various "tick" data for the moment in time indicated by the time-stamp in column #4. The lines in FILE2, should match the lines in FILE1, but there will be instances where they won't and I would like to put together a script that will determine the following:
1.) Using the "SEQUENCE NUMBER" (the typically 7-digit number found in column #2) as the key, which lines in FILE1, are not found in FILE2.
2.) If the SEQ# is found in FILE2, continue on to compare each remaining element of the shared record (TYPE in FILE1 to TYPE in FILE2, BID to BID, SIZE to SIZE, etc.)
From what I gather, I will need to create at least one HASH to perform this action. Using code examples found on the web, I know how to manually create a very basic HASH. What I don't know how to do is:
- Import a file into a HASH, using one element as the key and the remaining elements as individual values assigned to that key. At best, I think i've been able to only import each element of each line as its own key.
- The verbiage and format needed to articulate comparisons between elements. This is what confuses me the most.
LEGEND (only the first 10 elements of each line concern me): SYMBOL,SE
+QEUENCE#,TYPE (Quote or Trade or Custom),TIMESTAMP,TYPE,STATUS,BID,BI
+D-SIZE,ASK,ASK-SIZE
FILE1:
ESM3,2285319,Q,13:58:50.744000,Q,WIDE,1549.250000,656,1549.500000,522,
+0.000000,0.000000,0.000000,105,67,N,CME,CME
ESM3,2285247,T,13:58:49.986000,SELL,1549.250000,2,0,1738560,,U
ESM3,2285320,Q,13:58:50.749000,Q,WIDE,1549.250000,656,1549.500000,524,
+0.000000,0.000000,0.000000,105,68,N,CME,CME
ESM3,2285321,Q,13:58:50.750000,Q,WIDE,1549.250000,655,1549.500000,524,
+0.000000,0.000000,0.000000,104,68,N,CME,CME
ESM3,2285325,Q,13:58:50.801000,Q,WIDE,1549.250000,655,1549.500000,522,
+0.000000,0.000000,0.000000,104,67,N,CME,CME
ESM3,2285326,Q,13:58:50.802000,Q,WIDE,1549.250000,656,1549.500000,522,
+0.000000,0.000000,0.000000,105,67,N,CME,CME
ESM3,2285328,Q,13:58:50.831000,Q,WIDE,1549.250000,667,1549.500000,522,
+0.000000,0.000000,0.000000,106,67,N,CME,CME
ESM3,2285329,Q,13:58:50.832000,Q,WIDE,1549.250000,1504,1549.500000,522
+,0.000000,0.000000,0.000000,107,67,N,CME,CME
ESM3,2285330,Q,13:58:50.833000,Q,WIDE,1549.250000,1505,1549.500000,522
+,0.000000,0.000000,0.000000,108,67,N,CME,CME
ESM3,2285331,Q,13:58:50.833000,Q,WIDE,1549.250000,1506,1549.500000,522
+,0.000000,0.000000,0.000000,109,67,N,CME,CME
ESM3,2285332,Q,13:58:50.833000,Q,WIDE,1549.250000,1506,1549.500000,520
+,0.000000,0.000000,0.000000,109,66,N,CME,CME
ESM3,2285333,Q,13:58:50.833000,Q,WIDE,1549.250000,1506,1549.500000,519
+,0.000000,0.000000,0.000000,109,65,N,CME,CME
ESM3,2285334,Q,13:58:50.833000,Q,WIDE,1549.250000,1507,1549.500000,519
+,0.000000,0.000000,0.000000,110,65,N,CME,CME
FILE2:
ESM3,2341309,Q,14:13:42.044000,Q,WIDE,1550.000000,555,1550.250000,834,
+0.000000,0.000000,0.000000,140,76,N,CME,CME
ESM3,2341311,Q,14:13:42.445000,Q,WIDE,1550.000000,554,1550.250000,834,
+0.000000,0.000000,0.000000,139,76,N,CME,CME
ESM3,2341312,Q,14:13:42.445000,Q,WIDE,1550.000000,554,1550.250000,833,
+0.000000,0.000000,0.000000,139,75,N,CME,CME
ESM3,2341313,Q,14:13:42.544000,Q,WIDE,1550.000000,550,1550.250000,833,
+0.000000,0.000000,0.000000,138,75,N,CME,CME
ESM3,2341314,Q,14:13:42.544000,Q,WIDE,1550.000000,551,1550.250000,833,
+0.000000,0.000000,0.000000,139,75,N,CME,CME
ESM3,2341315,Q,14:13:42.544000,Q,WIDE,1550.000000,551,1550.250000,834,
+0.000000,0.000000,0.000000,139,76,N,CME,CME
ESM3,2341316,Q,14:13:42.666000,Q,WIDE,1550.000000,552,1550.250000,834,
+0.000000,0.000000,0.000000,140,76,N,CME,CME
ESM3,2341317,Q,14:13:42.809000,Q,WIDE,1550.000000,552,1550.250000,837,
+0.000000,0.000000,0.000000,140,77,N,CME,CME
ESM3,2341319,T,14:13:42.851000,SELL,1550.000000,5,0,1786787,,U
ESM3,2341319,Q,14:13:42.851000,Q,WIDE,1550.000000,547,1550.250000,837,
+0.000000,0.000000,0.000000,140,77,N,CME,CME
ESM3,2341320,Q,14:13:42.864000,Q,WIDE,1550.000000,542,1550.250000,837,
+0.000000,0.00000
I'm not exactly new to PERL, though I've only used it for very basic data manipulation or searches (where shell scripting would probably have been completely adequate, but have almost no experience shell scripting). Comparing multiple values in two different files has been absolutely puzzling to me.
Below is the closest I could get to importing anything into the HASH; using just one file for an example. Problem is, I've no idea what's being used for the key and i've no idea how to initiate a comparison between this and a second file.
#!/usr/bin/perl
#use warnings;
#use strict;
my $inFile = "CME.ESM3.MKD11.out";
open(FH1, '<', $inFile)
or die("Can't open input file \"$inFile\": $!\n");
my %hash;
while ($line=<FH1>) {
chomp;
split /,/, $line;
$hash{symbol} = $_[0];
$hash{seqNum} = $_[1];
$hash{type} = $_[2];
$hash{timestamp} = $_[3];
$hash{status} = $_[5];
$hash{bid} = $_[6];
$hash{bidVol} = $_[7];
$hash{ask} = $_[8];
$hash{askVol} = $_[9];
for $key (keys %hash)
{
print "$key\=$hash{$key}\t";
}
print "\n";
}
Any help would be greatly appreciated.
Thanks