Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

Findding missing record between 2 files ?

by bh_perl (Monk)
on Jan 06, 2011 at 11:37 UTC ( #880795=perlquestion: print w/ replies, xml ) Need Help??
bh_perl has asked for the wisdom of the Perl Monks concerning the following question:


Hi..i have 2 sample file to compare such as fileA.txt and fileB.txt. I want to compare those files and print all any data on fileA.txt and not found on fileB.txt only.

But my code look incorrect and its was displayed both missing data. How could i solved it ?. Please help me..
#!/usr/bin/perl -w use strict; use warnings; use POSIX qw(floor ceil); use Getopt::Long; use Cwd; my ($help,$file1,$file2,$outputdir,$trace,$second,$minute); my $input = getcwd; my $output = getcwd; GetOptions ( "h" => \$help, "t" => \$trace, "1=s" => \$file1, "s" => \$second, "m" => \$minute, "2=s" => \$file2, "o=s" => \$outputdir ) or usage(); sub usage { exit; } my ($nm,$dt,$ext1,$ext2); my ($match,$tdate,$ttime,$tanum,$tbnum,$tdur,$tamount,$tremarks,$tstd, +$th,$tm,$ts,$tt); my ($cdate,$ctime,$canum,$cbnum,$cdur,$camount,$cremarks,$cstd,$ch,$cm +,$cs,$ct); if (defined($outputdir)) { my @tmp = split(/_/, $file1); @tmp = reverse(@tmp); my $match_file = "$outputdir/DATA_MATCH_$tmp[1]_$tmp[0]"; my $diff_file = "$outputdir/DATA_MISS_$tmp[1]_$tmp[0]"; open (MATCH, ">> $match_file") or die ("Can't open $match_file\n") +; open (MISS, ">> $diff_file") or die ("Can't open $diff_file\n"); } my %status = (); my ($tk,$ta); open (FILE1, "$file1") or die ("Can't open $file1\n"); while (my $data = <FILE1>) { chomp($data); $data =~ s/\"//g; ($tdate,$ttime,$tanum,$tbnum,$tdur,$tamount,$tremarks,$tstd) = (sp +lit(/\,/, $data))[0,1,2,3,4,5,6,7]; ($th,$tm,$ts) = (split(/:/, $ttime))[0,1,2]; if (defined($minute)) { #$tk = floor("$tm" / 10 + 0.5)*10; $tk = floor("$tm" / 10)*10; if ($tk eq 60) { $th++; if ($th eq 24) { $tt = sprintf ("%02s%02s00","00","00"); } else { $tt = sprintf ("%02s%02s00",$th,"00"); } } else { $tt = sprintf ("%02s%02s00",$th,$tk); } } else { #$tk = floor("$tm.$ts" + 0.5); $tk = floor("$tm.$ts"); $tt = sprintf ("%02s%02s00\n",$th,$tk); } $tanum = substr($tanum,1) if ($tanum =~ /^6/); $status{$tanum.$tt} = 1; } close(FILE1); open (FILE2, "$file2") or die ("Can't open $file2\n"); while (my $cdata = <FILE2>) { chomp($cdata); my $xx; $cdata =~ s/\"//g; ($cdate,$ctime,$canum,$cbnum,$cdur,$camount,$cremarks,$cstd) = (sp +lit(/\,/, $cdata))[0,1,2,3,4,5,6,7]; ($th,$tm,$ts) = (split(/:/, $ctime))[0,1,2]; if (defined($minute)) { #$tk = floor("$tm" / 10 + 0.5)*10; $tk = floor("$tm" / 10)*10; if ($tk eq 60) { $th++; if ($th eq 24) { $tt = sprintf ("%02s%02s00","00","00"); } else { $tt = sprintf ("%02s%02s00",$th,"00"); } } else { $tt = sprintf ("%02s%02s00",$th,$tk); } } else { #$tk = floor("$tm.$ts" + 0.5); $tk = floor("$tm.$ts"); $tt = sprintf ("%02s%02s00\n",$th,$tk); } $xx = $status{$canum.$tt}; if (defined($xx)) { print "MATCH,$file1,$file2,$cdate,$ctime,$canum,$cbnum,$cdur,$ +camount,$cremarks,$cstd\n" if (defined($trace)); print MATCH "$file1,$file2,$cdate,$ctime,$canum,$cbnum,$cdur,$ +camount,$cremarks,$cstd\n" if (defined($outputdir)); } else { print "MISS,$file1,$file2,$cdate,$ctime,$canum,$cbnum,$cdur,$c +amount,$cremarks,$cstd\n" if (defined($trace)); print MISS "$file1,$file2,$cdate,$ctime,$canum,$cbnum,$cdur,$c +amount,$cremarks,$cstd\n" if (defined($outputdir)); } } close(FILE2); close (MATCH) if (defined($outputdir)); close (MISS) if (defined($outputdir));

Comment on Findding missing record between 2 files ?
Download Code
Re: Findding missing record between 2 files ?
by Anonymous Monk on Jan 06, 2011 at 12:08 UTC
    But my code look incorrect and its was displayed both missing data. How could i solved it ?. Please help me..

    Looks can be deceiving, try running the code

Re: Findding missing record between 2 files ?
by k_manimuthu (Monk) on Jan 06, 2011 at 12:39 UTC

    File A | File B
    A         A
    B         B
    C         -
    D         -
    -          E
    -          F
    G         G

    For the above sample you may expect the C,D. But, Your script gives C,D,E,F.
    You process the 'File A' and hold the data at %status.

    While process the File B you hold the data at the same hash, and check the key is exists or not. For this kind of circumstance it gives File A and File B contents.

    So avoid the cause hold File A contents in %hash_one and File B contents in another hash (%hash_two).
    Compare the %hash_one elements with %hash_two. We will get the C,D.

    Pseudo code

    Process File A contents and store in to %hash_one Process File B contents and strore in to %hash_two foreach $key (keys %hash_one) { if (! $hash_two{$key}) { Print "\nFile A contents", $hash_one{$key} , "missed at File B +"; } }
      (fixed typo in subject)

      As long as the files aren't large, that's fine, except that you're not finding keys present in file B but not in file A.

      You're using 2 hashes where one would suffice, so you can cut your memory usage in half.

      Pseudo code:

      Process File A contents and store in %hash_one while(<FileB>) { # do whatever you need to to compute $key if (! exists $hash_one{$key}) { print "File B contains $key but File A does not\n"; } delete $hash_one{$key}; } # the keys left in %hash_one weren't in File B foreach(keys %hash_one) { print "File A contains $key, but File B does not\n"; }

      Mike
Re: Finding missing record between 2 files ?
by toolic (Chancellor) on Jan 06, 2011 at 13:56 UTC
    To get more specific help, you should update your post with a small sample of your input files (a few lines each), your actual output and your expected output.

    Probably unrelated to your problem... if you are interested in using a more robust (and modern) Perl coding style, run your code through perlcritic. Here are a few of the issues:

    'Bareword file handle opened at line 37, column 2. See pages 202,204 +of PBP. (Severity: 5)', 'Two-argument "open" used at line 37, column 2. See page 207 of PBP. + (Severity: 5)', 'Close filehandles as soon as possible after opening them at line 37, +column 2. See page 209 of PBP. (Severity: 4)', '"$nm" is declared but not used at line 27, column 1. Unused variable +s clutter code and make it harder to read. (Severity: 3)', 'Mismatched operator at line 53, column 21. Numeric/string operators +and operands should match. (Severity: 3)',

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://880795]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chilling in the Monastery: (2)
As of 2014-07-26 14:24 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (177 votes), past polls