Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer

Re: Compare large files

by JavaFan (Canon)
on Jul 09, 2009 at 19:58 UTC ( #778684=note: print w/ replies, xml ) Need Help??

in reply to Compare large files

I'm not quite sure how you're comparing things, but won't something like the following do:

$ perl -ple 's/\s+/ /g' today | sort > today.s $ perl -ple 's/\s+/ /g' yesterday | sort > yesterday.s $ comm -3 today.s yesterday.s
Or as a one liner (bash syntax):
$ comm -3 <(perl -ple 's/\s+/ /g' today | sort) <(perl -ple 's/\s+/ /g +' yesterday | sort)

Comment on Re: Compare large files
Select or Download Code
Re^2: Compare large files
by mzedeler (Pilgrim) on Jul 09, 2009 at 20:05 UTC

    If you can't use any command line tools (such as comm as sugested), sort both files (using the sort utility) and read lines from both files, comparing them on the fly. This will enable you to compare arbitrarily large files with minimal overhead.

      If you can't use any command line tools, you can't use sort either....

        Sorry - I wasn't too precise on that one. What I meant was if the command line tools for file diff and comparison wasn't sufficient, use sort and something handwritten as specified.

        You can, however, use File::Sort. Though for one reason or another whenever I've needed to sort data and couldn't use the sort utility, I've always rolled my own.
      Be warned. In Linux you generally should set the environment variable LC_ALL to C before using sort. Otherwise its idea of sorted order does potentially inconvenient things like:
      1,10 11,1 1,123
      (What? You were expecting all of the things with ID 1 to be grouped together? Silly programmer, read the documentation!)
Re^2: Compare large files
by boardryder (Novice) on Jul 09, 2009 at 20:23 UTC
    I need to try it out, but it does look like it could work. Only I would need to create two new files using more disk space, and I was hoping to use pure perl. I also need to see how to work the output of comm.

    My data looks as follows. The keys I was referring to is the path of a file or directory, where file/directory sizes can change and files/directories may/may not exist in file1 compared file2
    File1: /home/users/ DIR 5555 /home/users/file FILE 324 /home/users/file2 FILE 435 .... .... File2: /home/users/file FILE 555 /home/users/ DIR 5888 /home/users/file2 FILE 435 .... ....
Re^2: Compare large files
by QM (Vicar) on Jul 09, 2009 at 21:27 UTC
    Would diff -B be a better choice here for ignoring whitespace differences?

    Quantum Mechanics: The dreams stuff is made of

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://778684]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others studying the Monastery: (5)
As of 2014-10-02 03:39 GMT
Find Nodes?
    Voting Booth?

    What is your favourite meta-syntactic variable name?

    Results (46 votes), past polls