Compare2Files LinebyLine

Objective was to find whether a line in one file exists in the other or not. This small prog compares 2 text files line per line, ignoring blank lines and lines begining with #. I sort the 2 files to make sure i traverse just once thru them. it is actually used in my company to check files against templates, hope u find it useful ...

######################################################################
+##
#Usage: perlscript [-w] -t template -d dump [-o outputfile]
#where  -w enables warnings i.e. warns of extra lines in the dump whic
+h are not there in the template
#       -t template is the file u compare against
#       -d dump is what u check (its dump 'cos i was checking the dump
+ of another prog against a template)
#if output file is not give ouput does to STDOUT
#
#Author: thesundayman(saurabhchandra@hotmail.com)
######################################################################
+####

$DEBUG = 0;

handleArgs();

if($outfile) { 
    open (STDOUT, ">$outfile") || die "Cannot redirect STDOUT\n";
}

open(TEMP,$dump) || die "Error opening $dump\n";
open(FILE,$template) || die "Error opening $template\n";

while(<TEMP>)
{
    tr/A-Z/a-z/;
    push(@temp, $_);
}

while(<FILE>)
{
    tr/A-Z/a-z/;
    push(@file, $_);
}

@temp = sort(@temp);
@file = sort(@file);

OUTER:
foreach $_ (@file) 
{
    next if /^\s*$/; #ignore empty lines
    next if /^\s*#/; #ignore comments
    chomp;
    $_ = lc;
    while(@temp) 
    {
        $x = shift @temp;
        next if ($x =~ /^\s*$/);
        next if ($x =~ /^\s*#/);
        chomp $x;

        $comparison = $_ cmp $x;
        print "$_ - $x = $comparison\n" if $DEBUG;
        if($comparison > 0) 
        {
            print "Warning $x not expected....\n" if $W;
            next;
        }
        elsif ($comparison < 0) 
        {
            print "Not Found $_\n";
            unshift(@temp, $x);
            next OUTER;
        }
        elsif ($comparison == 0) {
            print "Found $_ \n";
            next OUTER;
        }        
    }
    print "Not found $_\n";
}

if($W)
{
    foreach $temp (@temp) {    print "Warning $temp not expected\n";}
}

sub handleArgs()
{
    while(@ARGV)
    {
        print "In Handle args\n" if $DEBUG;
    #Enable warnings
        if($ARGV[0] =~ /-W/i)
        {
            $W = 1;    shift(@ARGV);
        }
        elsif($ARGV[0] =~ /-t/i)
        {
            $template = $ARGV[1];    splice(@ARGV, 0, 2);
        }
        elsif($ARGV[0] =~ /-d/i)
        {
            $dump = $ARGV[1];    splice(@ARGV, 0, 2);
        }
        elsif($ARGV[0] =~ /-O/i)
        {
            $outfile = $ARGV[1];    splice(@ARGV, 0, 2);
        }
else
        {
            print "Invalid Arg in FileCmp ", shift (@ARGV),"\n";
        }
    }
}
[download]

Comment on Compare2Files LinebyLine Download Code

Replies are listed 'Best First'.
Re: Compare2Files LinebyLine by merlyn (Sage) on Sep 26, 2001 at 17:43 UTC
Your code is invoking O(n squared) behavior. You should use a hash as a set type, instead of linearly comparing each line to the entire other file contents. As a brief example, the core of your program can be: `my %compare; my %files = ( a => 'oldfile', b => 'newfile' ); # compare oldfile to n +ewfile for my $filekey (keys %files) { open F, $files{$filekey} or next; while (<F>) { next if /^(#.)\s$/; # ignore blanks and comments $compare{lc $_} .= $filekey; } } print "Lines in newfile but not oldfile:\n", sort grep $compare{$_} !~ /a/, keys %compare; print "Lines in oldfile but not newfile:\n", sort grep $compare{$_} !~ /b/, keys %compare;` [download] I used this technique in a recent post as well, and you might see it clearer there. -- Randal L. Schwartz, Perl hacker	[reply] [d/l]
Re: Re: Compare2Files LinebyLine by thesundayman (Novice) on Sep 26, 2001 at 21:05 UTC
Thanks a ton. Can't imagine the feeling of having a reply from the great Merlyn. As always ur code is not only faster but neat and nice as well. However, my code doesn't go the O(n square) way, since i make atmost n comparisons (as the files are sorted first) i keep emptying the @temp array if u noticed. thanks again though.	[reply]
Re: Re: Re: Compare2Files LinebyLine by merlyn (Sage) on Sep 27, 2001 at 00:59 UTC
The sorting and the splicing in fact add some complexity that mine doesn't, so it's still a lot more for you than O(n). -- Randal L. Schwartz, Perl hacker	[reply]
Re: Re: Re: Re: Compare2Files LinebyLine by zoot (Initiate) on Feb 17, 2003 at 20:25 UTC
Re: Re: Re: Re: Re: Compare2Files LinebyLine by BrowserUk (Patriarch) on Feb 17, 2003 at 21:21 UTC
Re: Re: Re: Re: Compare2Files LinebyLine by thesundayman (Novice) on Sep 27, 2001 at 16:06 UTC
Re: Compare2Files LinebyLine by vroom (His Eminence) on Sep 26, 2001 at 19:01 UTC
You might be interested in checking out Algorithm::Diff. There is a good post by that one guy that shows how to highlight differences in a file. You would probably want to read in each of the files and then split them on \n. You could then grep out lines containing only whitespace or comments. After that you could use Algorithm::Diff to highlight any differences for you. vroom \| Tim Vroom \| vroom@blockstackers.com	[reply]
Re: Re: Compare2Files LinebyLine by thesundayman (Novice) on Sep 26, 2001 at 21:17 UTC
Thanks a lot. The Algorithm::Diff seems like a better option to take in this, reading the module even made me understand the problem of diff better. many thanks. thesundayman	[reply]


Do you know where your variables are?
	PerlMonks