I've written a little script which will check for duplicate files by walking down my file system. That part's no problem. What is a problem is how long it takes to get md5 signatures of large files, a couple of which are zipped tar files I'm using for backups.
Right now, I'm simply skipping files that are too big (2**24 bytes or larger), which is inelegant.
So, question 1 is how does md5's execution time scale with file size? (I would expect linearly, but I'm not sure)
Question 2: Is there a similarly reliable but quicker algorithm to get a file's signature?
I'm using the md5 program that came with my computer, which is a MacBook with a 2.1 Ghz Intel Core 2 processor, 1 BG RAM, and Mac OS X 10.7.4 (don't laugh; it was free ;))
Information about American English usage here and here. Floating point issues? Please read this before posting. — emc