If the files are not huge (i.e., one of them will fit in memory), I would go with the approach specified by hippo. Read file #1 into hash, then compare each line in file #2 to the hash.
If the files are too big for this, a slight modification: Read file #1 and store the seek (or tell) locations in the hash. Then compare each line in file #2 to the corresponding line in file #1, using your hash as a shortcut way to go straight to that line.
Update:
Sample of first option:
#!/usr/bin/perl
use strict;
use warnings;
my %FileInfo1 = ();
my ($inputFilename1, $inputFilename2, @otherParameters) = @ARGV;
# Read File 1 to Hash
open INPUT_FILE1, '<', $inputFilename1;
while (my $inputBuffer1 = <INPUT_FILE1>)
{
chomp $inputBuffer1;
my ($key, $data) = split /\|/, $inputBuffer1, 2;
$FileInfo1{$key} = $data;
}
close INPUT_FILE1;
# Read File 2 and Compare
open INPUT_FILE2, '<', $inputFilename2;
while (my $inputBuffer2 = <INPUT_FILE2>)
{
chomp $inputBuffer2;
my ($key, $data) = split /\|/, $inputBuffer2, 2;
if (!defined $FileInfo1{$key})
{
print "$key not found in $inputFilename1\n";
}
elsif ($FileInfo1{$key} ne $data)
{
print "$key data does not match\n";
delete $FileInfo1{$key};
}
else
{
print "$key - OK\n";
delete $FileInfo1{$key};
}
}
close INPUT_FILE2;
foreach my $leftoverKey (keys %FileInfo1)
{
print "$leftoverKey not found in $inputFilename2\n";
}
exit;
__END__
C:\Steve\Dev\PerlMonks\P-2013-10-08@1245-TwoFile-Keyed-Compare>type te
+st*.dat
test1.dat
A001|Steve|45
A002|George|32
A003|Alice|24
test2.dat
A001|Steve|45
A003|Alice|23
A004|Mike|48
C:\Steve\Dev\PerlMonks\P-2013-10-08@1245-TwoFile-Keyed-Compare>perl cm
+pfiles.pl test1.dat test2.dat
A001 - OK
A003 data does not match
A004 not found in test1.dat
A002 not found in test2.dat
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.
|