Re: Compare2Files LinebyLine

Your code is invoking O(n squared) behavior. You should use a hash as a set type, instead of linearly comparing each line to the entire other file contents.

As a brief example, the core of your program can be:

my %compare;
my %files = ( a => 'oldfile', b => 'newfile' ); # compare oldfile to n
+ewfile
for my $filekey (keys %files) {
  open F, $files{$filekey} or next;
  while (<F>) {
    next if /^(#.*)\s*$/; # ignore blanks and comments
    $compare{lc $_} .= $filekey;
  }
}
print "Lines in newfile but not oldfile:\n",
  sort grep $compare{$_} !~ /a/, keys %compare;
print "Lines in oldfile but not newfile:\n",
  sort grep $compare{$_} !~ /b/, keys %compare;
[download]

I used this technique in a recent post as well, and you might see it clearer there.

-- Randal L. Schwartz, Perl hacker

Comment on Re: Compare2Files LinebyLine Download Code

Replies are listed 'Best First'.
Re: Re: Compare2Files LinebyLine by thesundayman (Novice) on Sep 26, 2001 at 21:05 UTC
Thanks a ton. Can't imagine the feeling of having a reply from the great Merlyn. As always ur code is not only faster but neat and nice as well. However, my code doesn't go the O(n square) way, since i make atmost n comparisons (as the files are sorted first) i keep emptying the @temp array if u noticed. thanks again though.	[reply]
Re: Re: Re: Compare2Files LinebyLine by merlyn (Sage) on Sep 27, 2001 at 00:59 UTC
The sorting and the splicing in fact add some complexity that mine doesn't, so it's still a lot more for you than O(n). -- Randal L. Schwartz, Perl hacker	[reply]
Re: Re: Re: Re: Compare2Files LinebyLine by zoot (Initiate) on Feb 17, 2003 at 20:25 UTC
Hi Folks. Do you guys happen to have any suggestions for comparing 2 files line by line that don't involve loading all the lines into memory? I'm trying to compare two files that are each over 300MB in size. My system doesn't have enough memory to handle loading all the file lines into a hash. I've tried the readline approach but it takes forever to run. Unfortunately, I'm not able to load the data into a database either - even a Berkeley DB. Any ideas would be appreciated.	[reply]
Re: Re: Re: Re: Re: Compare2Files LinebyLine by BrowserUk (Patriarch) on Feb 17, 2003 at 21:21 UTC
Re: Re: Re: Re: Compare2Files LinebyLine by thesundayman (Novice) on Sep 27, 2001 at 16:06 UTC
Right as always :-)	[reply]


Pathologically Eclectic Rubbish Lister
	PerlMonks