Re: compare data between two files using Perl

in reply to compare data between two files using Perl

The solution you really want is a database. You can get a very lightweight one via the DBD::SQLite module (you'll also want DBI if you do anything with a database).

You'll want to read your file in and store it in a database. I see that you have tab-separated files -- you probably would save yourself a lot of work by using Text::CSV_XS to parse those instead of doing it yourself.

Then, a simple query to the database will find mismatches.

Here's a general (not debugged) example:

use strict; use warnings;

use DBI;
use DBD::SQLite;
use IO::File;
use Text::CSV_XS;

my $db_file = 'ref_compare.db';
my $csv = Text::CSV_XS->new({sep_char=>"\t"});

## remove the db file if it exists
unlink $db_file if -f $db_file;

my $dbh = DBI->connect("dbi:SQLite:dbname=$db_file",'','');

## create two tables.
## 1: For brd_sym_pn
$dbh->do(q'
   CREATE TABLE brd_sym_pn ( refdes TEXT, pnum TEXT, pkgtype TEXT )
');
## 2: For sym_text_latest
$dbh->do(q'
   CREATE TABLE sym_text_latest ( 
      logpnpkg TEXT, logpnum TEXT, logpkgtype TEXT 
   )
');

## ok, now load brd_sym_pn
my $sth = $dbh->prepare(q'
   INSERT INTO brd_sym_pn (refdes,pnum,pkgtype)
   VALUES (?,?,?)
');

my $brd_sym_pn_io = IO::File->new('brd_sym_pn.txt');

## use $brd_sym_pn_io->getline to skip any "header" rows

until ( $brd_sym_pn_io->eof ) {
   my $values = $csv->getline( $brd_sym_pn_io ); # parse data line
   for ( @$values ) { s/^\s+|\s+$/ } # trim lead/trail whitespace
   $sth->execute( @$values );  # inserts row into DB table
}


## ok, now load sym_text_latest
$sth = $dbh->prepare(q'
   INSERT INTO sym_text_latest (logpnpkg,logpnum,logpkgtype)
   VALUES (?,?,?)
');

my $sym_text_latest = IO::File->new('sym_text_latest.txt');

## use $sym_text_latest->getline to skip any "header" rows

until ( $sym_text_latest->eof ) {
   my $values = $csv->getline( $sym_text_latest ); # parse data line
   for ( @$values ) { s/^\s+|\s+$/ } # trim lead/trail whitespace
   $sth->execute( @$values );  # inserts row into DB table
}

## now you can use any query you want, even in other scripts
## let's find everything where pnums match, but pkgtypes don't:
$sth = $dbh->prepare(q'
   SELECT refdes, pnum, pkgtype, logpnum, logpkgtype
   FROM   brd_sym_pn, sym_text_latest
   WHERE  brd_sym_pn.pnum = sym_text_latest.logpnum
   AND    brd_sym_pn.pkgtype != sym_text_latest.logpkgtype
');
$sth->execute();

# print the results out.
print join "\t", qw/refdes pnum pkgtype logpnum logpkgtype/;
while ( my @row = $sth->fetchrow_array ) {
   print join "\t", @row;
}
[download]

Of course, you could also simply store your first file in a hash, using partnums as keys -- that's just lest flexible in terms of answering other questions about your data.

That should give you a fair number of ideas.

<–radiant.matrix–>
Ramblings and references
“A positive attitude may not solve all your problems, but it will annoy enough people to make it worth the effort.” — Herm Albright
I haven't found a problem yet that can't be solved by a well-placed trebuchet

In Section Seekers of Perl Wisdom