<?xml version="1.0" encoding="windows-1252"?>
<node id="991162" title="[OT]:Faster signature algorithm than md5?" created="2012-09-01 11:07:58" updated="2012-09-01 11:07:58">
<type id="115">
perlquestion</type>
<author id="494652">
swampyankee</author>
<data>
<field name="doctext">
&lt;p&gt;I've written a little script which will check for duplicate files by walking down my file system.  That part's no problem. What is a problem is how long it takes to get md5 signatures of large files, a couple of which are zipped tar files I'm using for backups.&lt;/p&gt;
&lt;p&gt;Right now, I'm simply skipping files that are too big (2**24 bytes or larger), which is inelegant.&lt;/p&gt;
&lt;p&gt;So, question 1 is how does md5's execution time scale with file size? (I would expect linearly, but I'm not sure)&lt;/p&gt;Question 2: Is there a similarly reliable but quicker algorithm to get a file's signature?&lt;/p&gt;
&lt;p&gt;I'm using the md5 program that came with my computer, which is a MacBook with a 2.1 Ghz Intel Core 2 processor, 1 BG RAM, and Mac OS X 10.7.4 (don't laugh;  it was free ;))&lt;/p&gt;
 &lt;!-- Node text goes above. Div tags should contain sig only --&gt;
&lt;div class="pmsig"&gt;&lt;div class="pmsig-494652"&gt;
&lt;hr&gt;&lt;p&gt;
Information about American English usage  [http://wsu.edu/~brians/errors/index.html|here] and [http://andromeda.rutgers.edu/~jlynch/Writing/|here]. Floating point issues? Please read [http://docs.sun.com/source/806-3568/ncg_goldberg.html| this ] before posting. &amp;mdash; emc&lt;/p&gt;
&lt;/div&gt;&lt;/div&gt;</field>
</data>
</node>
