Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Re: Compare2Files LinebyLine

by merlyn (Sage)
on Sep 26, 2001 at 17:43 UTC ( [id://114809]=note: print w/replies, xml ) Need Help??


in reply to Compare2Files LinebyLine

Your code is invoking O(n squared) behavior. You should use a hash as a set type, instead of linearly comparing each line to the entire other file contents.

As a brief example, the core of your program can be:

my %compare; my %files = ( a => 'oldfile', b => 'newfile' ); # compare oldfile to n +ewfile for my $filekey (keys %files) { open F, $files{$filekey} or next; while (<F>) { next if /^(#.*)\s*$/; # ignore blanks and comments $compare{lc $_} .= $filekey; } } print "Lines in newfile but not oldfile:\n", sort grep $compare{$_} !~ /a/, keys %compare; print "Lines in oldfile but not newfile:\n", sort grep $compare{$_} !~ /b/, keys %compare;
I used this technique in a recent post as well, and you might see it clearer there.

-- Randal L. Schwartz, Perl hacker

Replies are listed 'Best First'.
Re: Re: Compare2Files LinebyLine
by thesundayman (Novice) on Sep 26, 2001 at 21:05 UTC
    Thanks a ton. Can't imagine the feeling of having a reply from the great Merlyn. As always ur code is not only faster but neat and nice as well. However, my code doesn't go the O(n square) way, since i make atmost n comparisons (as the files are sorted first) i keep emptying the @temp array if u noticed. thanks again though.
        Hi Folks. Do you guys happen to have any suggestions for comparing 2 files line by line that don't involve loading all the lines into memory? I'm trying to compare two files that are each over 300MB in size. My system doesn't have enough memory to handle loading all the file lines into a hash. I've tried the readline approach but it takes forever to run. Unfortunately, I'm not able to load the data into a database either - even a Berkeley DB. Any ideas would be appreciated.
        Right as always :-)

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://114809]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (5)
As of 2024-04-23 23:26 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found