Compare multiple files and output differences

prescott2006 has asked for the wisdom of the Perl Monks concerning the following question:

I am writing a script to read a text file, say a.txt, which contains a list of text file to be analyzed, say b.txt, c.txt and d.txt. Each of the a, b and c will be opened and traced to capture a certain string using pattern matching. The result will be stored in corresponding log file. After that b.log will be treat as a reference, and c.log and d.log will be compared with b.log and the missing and extra entries of them will be output to result.list. This is the code I had written so far.

#!/usr/bin/perl

print "Enter filename:";
chomp($fname = <STDIN>);

# Open the file which contains list of files to be opened.  If unsucce
+ssful, print an error message and quit.

open (file1,$fname) || die "Can't Open File: $fname\n";

while (<file1>)
{
  chomp;
  open (file2, $_);   # Open file in the list one by one
  $temp = $_;         # Store the file name to name the corresponding 
+log file
  print "$temp\n";    # Print the file name to indicate the different 
+output
  while (<file2>)
  {
    chomp;
    sub grep_pattern    # Print strings which contain the pattern
    {foreach ($_)
    {if (/$pattern/)
     {  print "$1\n";      
        open (file3, ">>$temp.log");    
        print file3 "$1\n";     # Print result in log file in append m
+ode
        close (file3);    
     }
    }
    }
    $pattern = 'hello '; # Find the pattern
    grep_pattern;
  }
  print "\n";   # Print a blank line to seperate the results from each
+ file
}
close (file1);
[download]

But at best the log file should not be used and probably the intermediate result should be stored in some arrays or hashes and directly be compared. But I don't know how to create different array for each of the b.txt, c.txt and d.txt automatically. Anyone can help?

Comment on Compare multiple files and output differences Download Code

Replies are listed 'Best First'.
Re: Compare multiple files and output differences by rovf (Priest) on Mar 08, 2012 at 08:45 UTC
As for your question about calculating the differences, you could for instance use Text::Diff from CPAN. This has a pretty general interface and you can use it also for data stored inside your program. However, your program has some oddities. First of all, by all means add `use strict; use warnings;` to your code (and then fix all the error- and warning messages you will get after that). Also, always check whether opening the file works (as you did when opening file1). Then, though not forbidden, it is odd that you declare your subroutine `grep_pattern` inside the while loop. I think you want to make it a closure (using `$pattern`); however, in the way you wrote it, `$pattern` is a global variable and would be available to grep_pattern even if you declare the sub outside the block. There are more issues to your program, though, and I wonder to what extent you have tested the code already; but please fix first the items I have mentioned, and let's discuss the remaining ones afterwards. -- Ronald Fischer <ynnor@mm.st>	[reply] [d/l] [select]
Re: Compare multiple files and output differences by kcott (Archbishop) on Mar 08, 2012 at 09:16 UTC
Firstly, some notes on what you've provided. Subroutine definitions You're using sub inside a while loop. In rare cases, it may be appropriate to do this - this is not one of them. With the code you have here, a better place would be after `close (file1);`. Subroutine calls You've called your subroutine as `grep_pattern`; this would be better as `grep_pattern()`. See perlsub for more details. Opening files The three argument form of open is preferred. If `open()` fails, `$!` holds the reason why - you can use this in your error messages. Feedback on your code Perl will provide you with feedback when you make mistakes or attempt dangerous operations. To get this feedback you need to use the strict and warnings pragmata. To get more verbose messages, also use the diagnostics pragma. Maintenance Your code will be easier to read and maintain if it is laid out in a consistent manner. See perlstyle. Here's a script which (hopefully) does everything you want. It doesn't use a subroutine and avoids the intermediate files. The hash `%entry_seen` keeps track, as the name suggests, of the entries you've seen. #!/usr/bin/env perl use strict; use warnings; my $out_file = q{result.list}; my $search_re = qr{hello}; my %entry_seen = (); print q{Enter reference filename: }; chomp(my $ref_file = <STDIN>); open my $fh_ref, q{<}, $ref_file or die qq{Can't open $ref_file: $!}; open my $fh_out, q{>}, $out_file or die qq{Can't open $out_file: $!}; while (defined(my $txt_file = <$fh_ref>)) { chomp $txt_file; open my $fh_txt, q{<}, $txt_file or do { warn qq{! SKIPPING: $txt_file: $!}; next; }; print qq{PROCESSING: $txt_file\n}; while (defined(my $txt_line = <$fh_txt>)) { chomp $txt_line; next if $txt_line !~ $search_re; if (! $entry_seen{$txt_line}++) { print $fh_out qq{[$txt_file] $txt_line\n}; } } close $fh_txt; } close $fh_out; close $fh_ref; [download] Here's the contents of the various files used and an example run. ken@ganymede: ~/tmp/PM_DIFF $ cat a.txt b.txt c.txt dummy.txt d.txt ken@ganymede: ~/tmp/PM_DIFF $ cat b.txt Hello hello hello, world goodbye ken@ganymede: ~/tmp/PM_DIFF $ cat c.txt hello world hullo shellow ken@ganymede: ~/tmp/PM_DIFF $ cat d.txt hello, world hello and goodbye Hello, world ken@ganymede: ~/tmp/PM_DIFF $ cat result.list ken@ganymede: ~/tmp/PM_DIFF $ pm_diff.pl Enter reference filename: not_a_file Can't open not_a_file: No such file or directory at ./pm_diff.pl line +13, <STDIN> line 1. ken@ganymede: ~/tmp/PM_DIFF $ cat result.list ken@ganymede: ~/tmp/PM_DIFF $ pm_diff.pl Enter reference filename: a.txt PROCESSING: b.txt PROCESSING: c.txt ! SKIPPING: dummy.txt: No such file or directory at ./pm_diff.pl line +19, <$fh_ref> line 3. PROCESSING: d.txt ken@ganymede: ~/tmp/PM_DIFF $ cat result.list [b.txt] hello [b.txt] hello, world [c.txt] hello world [c.txt] shellow [d.txt] hello and goodbye ken@ganymede: ~/tmp/PM_DIFF $ [download] Update: s/valid to do this/appropriate to do this/ -- Ken	[reply] [d/l] [select]
Re: Compare multiple files and output differences by nemesdani (Friar) on Mar 08, 2012 at 08:57 UTC
Another minor suggestion AFTER you cleared up like rovf suggested: Code will be more transparent if you store the filenames in an array: `@files = <file1>` Than you can make the foreach `$file(@files)` loop and grep, storing the results in a simple array, or file, as you wish.	[reply] [d/l] [select]
Re: Compare multiple files and output differences by JavaFan (Canon) on Mar 08, 2012 at 10:04 UTC
Too hard to do in Perl. Let's use the shell! > result.log f=`head -1 a.txt` grep PATTERN $f > $f.log for g in `tail +2 a.txt` do grep PATTERN $g > $g.log comm -3 $g.log $f.log >> result.log rm $g.log done rm $f.log [download]	[reply] [d/l]


Perl: the Markov chain saw
	PerlMonks