Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

Compare multiple files and output differences

by prescott2006 (Acolyte)
on Mar 08, 2012 at 05:53 UTC ( [id://958432]=perlquestion: print w/replies, xml ) Need Help??

prescott2006 has asked for the wisdom of the Perl Monks concerning the following question:

I am writing a script to read a text file, say a.txt, which contains a list of text file to be analyzed, say b.txt, c.txt and d.txt. Each of the a, b and c will be opened and traced to capture a certain string using pattern matching. The result will be stored in corresponding log file. After that b.log will be treat as a reference, and c.log and d.log will be compared with b.log and the missing and extra entries of them will be output to result.list. This is the code I had written so far.
#!/usr/bin/perl print "Enter filename:"; chomp($fname = <STDIN>); # Open the file which contains list of files to be opened. If unsucce +ssful, print an error message and quit. open (file1,$fname) || die "Can't Open File: $fname\n"; while (<file1>) { chomp; open (file2, $_); # Open file in the list one by one $temp = $_; # Store the file name to name the corresponding +log file print "$temp\n"; # Print the file name to indicate the different +output while (<file2>) { chomp; sub grep_pattern # Print strings which contain the pattern {foreach ($_) {if (/$pattern/) { print "$1\n"; open (file3, ">>$temp.log"); print file3 "$1\n"; # Print result in log file in append m +ode close (file3); } } } $pattern = 'hello '; # Find the pattern grep_pattern; } print "\n"; # Print a blank line to seperate the results from each + file } close (file1);
But at best the log file should not be used and probably the intermediate result should be stored in some arrays or hashes and directly be compared. But I don't know how to create different array for each of the b.txt, c.txt and d.txt automatically. Anyone can help?

Replies are listed 'Best First'.
Re: Compare multiple files and output differences
by rovf (Priest) on Mar 08, 2012 at 08:45 UTC
    As for your question about calculating the differences, you could for instance use Text::Diff from CPAN. This has a pretty general interface and you can use it also for data stored inside your program.

    However, your program has some oddities. First of all, by all means add use strict; use warnings; to your code (and then fix all the error- and warning messages you will get after that). Also, always check whether opening the file works (as you did when opening file1). Then, though not forbidden, it is odd that you declare your subroutine grep_pattern inside the while loop. I think you want to make it a closure (using $pattern); however, in the way you wrote it, $pattern is a global variable and would be available to grep_pattern even if you declare the sub outside the block.

    There are more issues to your program, though, and I wonder to what extent you have tested the code already; but please fix first the items I have mentioned, and let's discuss the remaining ones afterwards.
    -- 
    Ronald Fischer <ynnor@mm.st>
Re: Compare multiple files and output differences
by kcott (Archbishop) on Mar 08, 2012 at 09:16 UTC

    Firstly, some notes on what you've provided.

    Subroutine definitions
    You're using sub inside a while loop. In rare cases, it may be appropriate to do this - this is not one of them. With the code you have here, a better place would be after close (file1);.
    Subroutine calls
    You've called your subroutine as grep_pattern; this would be better as grep_pattern(). See perlsub for more details.
    Opening files
    The three argument form of open is preferred. If open() fails, $! holds the reason why - you can use this in your error messages.
    Feedback on your code
    Perl will provide you with feedback when you make mistakes or attempt dangerous operations. To get this feedback you need to use the strict and warnings pragmata. To get more verbose messages, also use the diagnostics pragma.
    Maintenance
    Your code will be easier to read and maintain if it is laid out in a consistent manner. See perlstyle.

    Here's a script which (hopefully) does everything you want. It doesn't use a subroutine and avoids the intermediate files. The hash %entry_seen keeps track, as the name suggests, of the entries you've seen.

    #!/usr/bin/env perl use strict; use warnings; my $out_file = q{result.list}; my $search_re = qr{hello}; my %entry_seen = (); print q{Enter reference filename: }; chomp(my $ref_file = <STDIN>); open my $fh_ref, q{<}, $ref_file or die qq{Can't open $ref_file: $!}; open my $fh_out, q{>}, $out_file or die qq{Can't open $out_file: $!}; while (defined(my $txt_file = <$fh_ref>)) { chomp $txt_file; open my $fh_txt, q{<}, $txt_file or do { warn qq{! SKIPPING: $txt_file: $!}; next; }; print qq{PROCESSING: $txt_file\n}; while (defined(my $txt_line = <$fh_txt>)) { chomp $txt_line; next if $txt_line !~ $search_re; if (! $entry_seen{$txt_line}++) { print $fh_out qq{[$txt_file] $txt_line\n}; } } close $fh_txt; } close $fh_out; close $fh_ref;

    Here's the contents of the various files used and an example run.

    ken@ganymede: ~/tmp/PM_DIFF $ cat a.txt b.txt c.txt dummy.txt d.txt ken@ganymede: ~/tmp/PM_DIFF $ cat b.txt Hello hello hello, world goodbye ken@ganymede: ~/tmp/PM_DIFF $ cat c.txt hello world hullo shellow ken@ganymede: ~/tmp/PM_DIFF $ cat d.txt hello, world hello and goodbye Hello, world ken@ganymede: ~/tmp/PM_DIFF $ cat result.list ken@ganymede: ~/tmp/PM_DIFF $ pm_diff.pl Enter reference filename: not_a_file Can't open not_a_file: No such file or directory at ./pm_diff.pl line +13, <STDIN> line 1. ken@ganymede: ~/tmp/PM_DIFF $ cat result.list ken@ganymede: ~/tmp/PM_DIFF $ pm_diff.pl Enter reference filename: a.txt PROCESSING: b.txt PROCESSING: c.txt ! SKIPPING: dummy.txt: No such file or directory at ./pm_diff.pl line +19, <$fh_ref> line 3. PROCESSING: d.txt ken@ganymede: ~/tmp/PM_DIFF $ cat result.list [b.txt] hello [b.txt] hello, world [c.txt] hello world [c.txt] shellow [d.txt] hello and goodbye ken@ganymede: ~/tmp/PM_DIFF $

    Update: s/valid to do this/appropriate to do this/

    -- Ken

Re: Compare multiple files and output differences
by nemesdani (Friar) on Mar 08, 2012 at 08:57 UTC
    Another minor suggestion AFTER you cleared up like rovf suggested:
    Code will be more transparent if you store the filenames in an array:
     @files = <file1>
    Than you can make the foreach  $file(@files) loop and grep, storing the results in a simple array, or file, as you wish.
Re: Compare multiple files and output differences
by JavaFan (Canon) on Mar 08, 2012 at 10:04 UTC
    Too hard to do in Perl. Let's use the shell!
    > result.log f=`head -1 a.txt` grep PATTERN $f > $f.log for g in `tail +2 a.txt` do grep PATTERN $g > $g.log comm -3 $g.log $f.log >> result.log rm $g.log done rm $f.log

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://958432]
Approved by lidden
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chilling in the Monastery: (5)
As of 2024-04-19 10:48 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found