Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

Re: compare data between two files using Perl

by radiantmatrix (Parson)
on Jun 16, 2008 at 20:40 UTC ( #692362=note: print w/ replies, xml ) Need Help??


in reply to compare data between two files using Perl

The solution you really want is a database. You can get a very lightweight one via the DBD::SQLite module (you'll also want DBI if you do anything with a database).

You'll want to read your file in and store it in a database. I see that you have tab-separated files -- you probably would save yourself a lot of work by using Text::CSV_XS to parse those instead of doing it yourself.

Then, a simple query to the database will find mismatches.

Here's a general (not debugged) example:

use strict; use warnings; use DBI; use DBD::SQLite; use IO::File; use Text::CSV_XS; my $db_file = 'ref_compare.db'; my $csv = Text::CSV_XS->new({sep_char=>"\t"}); ## remove the db file if it exists unlink $db_file if -f $db_file; my $dbh = DBI->connect("dbi:SQLite:dbname=$db_file",'',''); ## create two tables. ## 1: For brd_sym_pn $dbh->do(q' CREATE TABLE brd_sym_pn ( refdes TEXT, pnum TEXT, pkgtype TEXT ) '); ## 2: For sym_text_latest $dbh->do(q' CREATE TABLE sym_text_latest ( logpnpkg TEXT, logpnum TEXT, logpkgtype TEXT ) '); ## ok, now load brd_sym_pn my $sth = $dbh->prepare(q' INSERT INTO brd_sym_pn (refdes,pnum,pkgtype) VALUES (?,?,?) '); my $brd_sym_pn_io = IO::File->new('brd_sym_pn.txt'); ## use $brd_sym_pn_io->getline to skip any "header" rows until ( $brd_sym_pn_io->eof ) { my $values = $csv->getline( $brd_sym_pn_io ); # parse data line for ( @$values ) { s/^\s+|\s+$/ } # trim lead/trail whitespace $sth->execute( @$values ); # inserts row into DB table } ## ok, now load sym_text_latest $sth = $dbh->prepare(q' INSERT INTO sym_text_latest (logpnpkg,logpnum,logpkgtype) VALUES (?,?,?) '); my $sym_text_latest = IO::File->new('sym_text_latest.txt'); ## use $sym_text_latest->getline to skip any "header" rows until ( $sym_text_latest->eof ) { my $values = $csv->getline( $sym_text_latest ); # parse data line for ( @$values ) { s/^\s+|\s+$/ } # trim lead/trail whitespace $sth->execute( @$values ); # inserts row into DB table } ## now you can use any query you want, even in other scripts ## let's find everything where pnums match, but pkgtypes don't: $sth = $dbh->prepare(q' SELECT refdes, pnum, pkgtype, logpnum, logpkgtype FROM brd_sym_pn, sym_text_latest WHERE brd_sym_pn.pnum = sym_text_latest.logpnum AND brd_sym_pn.pkgtype != sym_text_latest.logpkgtype '); $sth->execute(); # print the results out. print join "\t", qw/refdes pnum pkgtype logpnum logpkgtype/; while ( my @row = $sth->fetchrow_array ) { print join "\t", @row; }

Of course, you could also simply store your first file in a hash, using partnums as keys -- that's just lest flexible in terms of answering other questions about your data.

That should give you a fair number of ideas.

<radiant.matrix>
Ramblings and references
“A positive attitude may not solve all your problems, but it will annoy enough people to make it worth the effort.” Herm Albright
I haven't found a problem yet that can't be solved by a well-placed trebuchet


Comment on Re: compare data between two files using Perl
Download Code

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://692362]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others browsing the Monastery: (6)
As of 2015-07-04 20:26 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (60 votes), past polls