Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Compare2Files LinebyLine

by thesundayman (Novice)
on Sep 26, 2001 at 11:31 UTC ( #114757=snippet: print w/ replies, xml ) Need Help??

Description: Objective was to find whether a line in one file exists in the other or not. This small prog compares 2 text files line per line, ignoring blank lines and lines begining with #. I sort the 2 files to make sure i traverse just once thru them. it is actually used in my company to check files against templates, hope u find it useful ...
######################################################################
+##
#Usage: perlscript [-w] -t template -d dump [-o outputfile]
#where  -w enables warnings i.e. warns of extra lines in the dump whic
+h are not there in the template
#       -t template is the file u compare against
#       -d dump is what u check (its dump 'cos i was checking the dump
+ of another prog against a template)
#if output file is not give ouput does to STDOUT
#
#Author: thesundayman(saurabhchandra@hotmail.com)
######################################################################
+####

$DEBUG = 0;

handleArgs();

if($outfile) { 
    open (STDOUT, ">$outfile") || die "Cannot redirect STDOUT\n";
}

open(TEMP,$dump) || die "Error opening $dump\n";
open(FILE,$template) || die "Error opening $template\n";

while(<TEMP>)
{
    tr/A-Z/a-z/;
    push(@temp, $_);
}

while(<FILE>)
{
    tr/A-Z/a-z/;
    push(@file, $_);
}

@temp = sort(@temp);
@file = sort(@file);

OUTER:
foreach $_ (@file) 
{
    next if /^\s*$/; #ignore empty lines
    next if /^\s*#/; #ignore comments
    chomp;
    $_ = lc;
    while(@temp) 
    {
        $x = shift @temp;
        next if ($x =~ /^\s*$/);
        next if ($x =~ /^\s*#/);
        chomp $x;

        $comparison = $_ cmp $x;
        print "$_ - $x = $comparison\n" if $DEBUG;
        if($comparison > 0) 
        {
            print "Warning $x not expected....\n" if $W;
            next;
        }
        elsif ($comparison < 0) 
        {
            print "Not Found $_\n";
            unshift(@temp, $x);
            next OUTER;
        }
        elsif ($comparison == 0) {
            print "Found $_ \n";
            next OUTER;
        }        
    }
    print "Not found $_\n";
}

if($W)
{
    foreach $temp (@temp) {    print "Warning $temp not expected\n";}
}

sub handleArgs()
{
    while(@ARGV)
    {
        print "In Handle args\n" if $DEBUG;
    #Enable warnings
        if($ARGV[0] =~ /-W/i)
        {
            $W = 1;    shift(@ARGV);
        }
        elsif($ARGV[0] =~ /-t/i)
        {
            $template = $ARGV[1];    splice(@ARGV, 0, 2);
        }
        elsif($ARGV[0] =~ /-d/i)
        {
            $dump = $ARGV[1];    splice(@ARGV, 0, 2);
        }
        elsif($ARGV[0] =~ /-O/i)
        {
            $outfile = $ARGV[1];    splice(@ARGV, 0, 2);
        }
else
        {
            print "Invalid Arg in FileCmp ", shift (@ARGV),"\n";
        }
    }
}
Comment on Compare2Files LinebyLine
Download Code
Re: Compare2Files LinebyLine
by merlyn (Sage) on Sep 26, 2001 at 17:43 UTC
    Your code is invoking O(n squared) behavior. You should use a hash as a set type, instead of linearly comparing each line to the entire other file contents.

    As a brief example, the core of your program can be:

    my %compare; my %files = ( a => 'oldfile', b => 'newfile' ); # compare oldfile to n +ewfile for my $filekey (keys %files) { open F, $files{$filekey} or next; while (<F>) { next if /^(#.*)\s*$/; # ignore blanks and comments $compare{lc $_} .= $filekey; } } print "Lines in newfile but not oldfile:\n", sort grep $compare{$_} !~ /a/, keys %compare; print "Lines in oldfile but not newfile:\n", sort grep $compare{$_} !~ /b/, keys %compare;
    I used this technique in a recent post as well, and you might see it clearer there.

    -- Randal L. Schwartz, Perl hacker

      Thanks a ton. Can't imagine the feeling of having a reply from the great Merlyn. As always ur code is not only faster but neat and nice as well. However, my code doesn't go the O(n square) way, since i make atmost n comparisons (as the files are sorted first) i keep emptying the @temp array if u noticed. thanks again though.
Re: Compare2Files LinebyLine
by vroom (Pope) on Sep 26, 2001 at 19:01 UTC
    You might be interested in checking out Algorithm::Diff. There is a good post by that one guy that shows how to highlight differences in a file. You would probably want to read in each of the files and then split them on \n. You could then grep out lines containing only whitespace or comments. After that you could use Algorithm::Diff to highlight any differences for you.

    vroom | Tim Vroom | vroom@blockstackers.com
      Thanks a lot.

      The Algorithm::Diff seems like a better option to take in this, reading the module even made me understand the problem of diff better. many thanks.

      thesundayman

Back to Snippets Section

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: snippet [id://114757]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others rifling through the Monastery: (11)
As of 2014-10-24 15:06 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    For retirement, I am banking on:










    Results (132 votes), past polls