http://www.perlmonks.org?node_id=1038543


in reply to Re^2: Digest::SHA gives different values for unix/windows
in thread Digest::SHA gives different values for unix/windows

I saw your comment above that the problem seems to be in the transferring of the file. In addition to that, I just wanted to confirm what syphilis suggested: After transfer you should be using the "b" mode. From the Digest::SHA docs (emphasis mine):

The "p" mode is handy since it ensures that the digest value of $filename will be the same when computed on different operating systems. It accomplishes this by internally translating all newlines in text files to UNIX format before calculating the digest. Binary files are read in raw mode with no translation whatsoever.

The name "portable" is a bit confusing here.

  • Comment on Re^3: Digest::SHA gives different values for unix/windows

Replies are listed 'Best First'.
Re^4: Digest::SHA gives different values for unix/windows
by Jim (Curate) on Jun 12, 2013 at 20:30 UTC

    I'm dumbfounded by the "feature" of a mode argument to the Digest::SHA::addfile() method. Why does a module whose simple purpose is to compute the message digest of a block of data permit monkeying with that block of data? It violates the principle of separation of concerns. Look at the trouble it caused rmahin.

    There's no such feature in Digest::MD5.

      Hi this is my first post.
      Hello Monks!

      Given that some OSes treat binary and text files differently (the latter messing around with line endings as you said in your first post), I think you really need a $mode argument to the Digest::SHA::addfile($filename [, $mode]) method.
      Otherwise you'd have to do something like this to generate the hash of a file in binary mode:
      open the file set binmode while (not EOF) { read N-bytes of the file into a buffer # don't want to read the +whole file into memory if it's a big file Digest::SHA::add($buffer) } close file
      It's easier to just call Digest::SHA::addfile($filename "b")
      It might cause fewer problems if Digest::SHA::addfile() defaulted to binary mode.
        Hi this is my first post. Hello Monks!

        Hello! And welcome. Nice first post.

        Given that some OSes treat binary and text files differently (the latter messing around with line endings as you said in your first post), I think you really need a $mode argument to the Digest::SHA::addfile($filename [, $mode]) method.

        It's intended as a convenience feature much like the ASCII and IMAGE (BINARY) modes of FTP. But I would argue it's a mis-feature—or at least a misplaced feature. It's akin to the line ending translation feature of FTP, which proved over time to do more harm than good. It was a fine feature in the early days of the Internet when FTP was used mostly by savvy technologists to bandy about lots of source code (plain text). But later, it just caused endless trouble for naïve Internet users who mostly used FTP to transfer large ZIP files and such. ASCII mode became the wrong default. Much bandwidth was wasted transferring large binary files multiple times because they all-too-often got corrupted the first time.

        It might cause fewer problems if Digest::SHA::addfile() defaulted to binary mode.

        I think it essentially does. From my reading of the confusing documentation, the right thing to do in the general case (i.e., the case where you don't want line ending translation to happen) is not to use the mode argument of Digest::SHA::addfile() at all. It states:

        "By default, $filename is simply opened and read; no special modes or I/O disciplines are used."