Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

Compare2Files LinebyLine

by thesundayman (Novice)
on Sep 26, 2001 at 11:31 UTC ( [id://114757]=CUFP: print w/replies, xml ) Need Help??

Objective was to find whether a line in one file exists in the other or not. This small prog compares 2 text files line per line, ignoring blank lines and lines begining with #. I sort the 2 files to make sure i traverse just once thru them. it is actually used in my company to check files against templates, hope u find it useful ...
###################################################################### +## #Usage: perlscript [-w] -t template -d dump [-o outputfile] #where -w enables warnings i.e. warns of extra lines in the dump whic +h are not there in the template # -t template is the file u compare against # -d dump is what u check (its dump 'cos i was checking the dump + of another prog against a template) #if output file is not give ouput does to STDOUT # #Author: thesundayman(saurabhchandra@hotmail.com) ###################################################################### +#### $DEBUG = 0; handleArgs(); if($outfile) { open (STDOUT, ">$outfile") || die "Cannot redirect STDOUT\n"; } open(TEMP,$dump) || die "Error opening $dump\n"; open(FILE,$template) || die "Error opening $template\n"; while(<TEMP>) { tr/A-Z/a-z/; push(@temp, $_); } while(<FILE>) { tr/A-Z/a-z/; push(@file, $_); } @temp = sort(@temp); @file = sort(@file); OUTER: foreach $_ (@file) { next if /^\s*$/; #ignore empty lines next if /^\s*#/; #ignore comments chomp; $_ = lc; while(@temp) { $x = shift @temp; next if ($x =~ /^\s*$/); next if ($x =~ /^\s*#/); chomp $x; $comparison = $_ cmp $x; print "$_ - $x = $comparison\n" if $DEBUG; if($comparison > 0) { print "Warning $x not expected....\n" if $W; next; } elsif ($comparison < 0) { print "Not Found $_\n"; unshift(@temp, $x); next OUTER; } elsif ($comparison == 0) { print "Found $_ \n"; next OUTER; } } print "Not found $_\n"; } if($W) { foreach $temp (@temp) { print "Warning $temp not expected\n";} } sub handleArgs() { while(@ARGV) { print "In Handle args\n" if $DEBUG; #Enable warnings if($ARGV[0] =~ /-W/i) { $W = 1; shift(@ARGV); } elsif($ARGV[0] =~ /-t/i) { $template = $ARGV[1]; splice(@ARGV, 0, 2); } elsif($ARGV[0] =~ /-d/i) { $dump = $ARGV[1]; splice(@ARGV, 0, 2); } elsif($ARGV[0] =~ /-O/i) { $outfile = $ARGV[1]; splice(@ARGV, 0, 2); } else { print "Invalid Arg in FileCmp ", shift (@ARGV),"\n"; } } }

Replies are listed 'Best First'.
Re: Compare2Files LinebyLine
by merlyn (Sage) on Sep 26, 2001 at 17:43 UTC
    Your code is invoking O(n squared) behavior. You should use a hash as a set type, instead of linearly comparing each line to the entire other file contents.

    As a brief example, the core of your program can be:

    my %compare; my %files = ( a => 'oldfile', b => 'newfile' ); # compare oldfile to n +ewfile for my $filekey (keys %files) { open F, $files{$filekey} or next; while (<F>) { next if /^(#.*)\s*$/; # ignore blanks and comments $compare{lc $_} .= $filekey; } } print "Lines in newfile but not oldfile:\n", sort grep $compare{$_} !~ /a/, keys %compare; print "Lines in oldfile but not newfile:\n", sort grep $compare{$_} !~ /b/, keys %compare;
    I used this technique in a recent post as well, and you might see it clearer there.

    -- Randal L. Schwartz, Perl hacker

      Thanks a ton. Can't imagine the feeling of having a reply from the great Merlyn. As always ur code is not only faster but neat and nice as well. However, my code doesn't go the O(n square) way, since i make atmost n comparisons (as the files are sorted first) i keep emptying the @temp array if u noticed. thanks again though.
Re: Compare2Files LinebyLine
by vroom (His Eminence) on Sep 26, 2001 at 19:01 UTC
    You might be interested in checking out Algorithm::Diff. There is a good post by that one guy that shows how to highlight differences in a file. You would probably want to read in each of the files and then split them on \n. You could then grep out lines containing only whitespace or comments. After that you could use Algorithm::Diff to highlight any differences for you.

    vroom | Tim Vroom | vroom@blockstackers.com
      Thanks a lot.

      The Algorithm::Diff seems like a better option to take in this, reading the module even made me understand the problem of diff better. many thanks.

      thesundayman

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: CUFP [id://114757]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (3)
As of 2024-04-24 17:37 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found