http://www.perlmonks.org?node_id=1038576


in reply to Re^3: Digest::SHA gives different values for unix/windows
in thread Digest::SHA gives different values for unix/windows

I'm dumbfounded by the "feature" of a mode argument to the Digest::SHA::addfile() method. Why does a module whose simple purpose is to compute the message digest of a block of data permit monkeying with that block of data? It violates the principle of separation of concerns. Look at the trouble it caused rmahin.

There's no such feature in Digest::MD5.

Replies are listed 'Best First'.
Re^5: Digest::SHA gives different values for unix/windows
by zork42 (Monk) on Jun 13, 2013 at 04:50 UTC
    Hi this is my first post.
    Hello Monks!

    Given that some OSes treat binary and text files differently (the latter messing around with line endings as you said in your first post), I think you really need a $mode argument to the Digest::SHA::addfile($filename [, $mode]) method.
    Otherwise you'd have to do something like this to generate the hash of a file in binary mode:
    open the file set binmode while (not EOF) { read N-bytes of the file into a buffer # don't want to read the +whole file into memory if it's a big file Digest::SHA::add($buffer) } close file
    It's easier to just call Digest::SHA::addfile($filename "b")
    It might cause fewer problems if Digest::SHA::addfile() defaulted to binary mode.
      Hi this is my first post. Hello Monks!

      Hello! And welcome. Nice first post.

      Given that some OSes treat binary and text files differently (the latter messing around with line endings as you said in your first post), I think you really need a $mode argument to the Digest::SHA::addfile($filename [, $mode]) method.

      It's intended as a convenience feature much like the ASCII and IMAGE (BINARY) modes of FTP. But I would argue it's a mis-feature—or at least a misplaced feature. It's akin to the line ending translation feature of FTP, which proved over time to do more harm than good. It was a fine feature in the early days of the Internet when FTP was used mostly by savvy technologists to bandy about lots of source code (plain text). But later, it just caused endless trouble for naïve Internet users who mostly used FTP to transfer large ZIP files and such. ASCII mode became the wrong default. Much bandwidth was wasted transferring large binary files multiple times because they all-too-often got corrupted the first time.

      It might cause fewer problems if Digest::SHA::addfile() defaulted to binary mode.

      I think it essentially does. From my reading of the confusing documentation, the right thing to do in the general case (i.e., the case where you don't want line ending translation to happen) is not to use the mode argument of Digest::SHA::addfile() at all. It states:

      "By default, $filename is simply opened and read; no special modes or I/O disciplines are used."
        Hi Jim, thanks for the welcome!
        "By default, $filename is simply opened and read; no special modes or I/O disciplines are used."
        I agree this is not very clear.
        My interpretation was that Digest::SHA::addfile() will default to text mode, at least it would under Windows. With Windows you must explicitly select binmode or open filehandles default to text mode.

        From perlfunc.html:
        binmode FILEHANDLE, LAYER
        binmode FILEHANDLE

        Arranges for FILEHANDLE to be read or written in "binary" or "text" mode on systems where the run-time libraries distinguish between binary and text files. If FILEHANDLE is an expression, the value is taken as the name of the filehandle. Returns true on success, otherwise it returns undef and sets $! (errno).

        On some systems (in general, DOS and Windows-based systems) binmode() is necessary when you're not working with a text file. For the sake of portability it is a good idea to always use it when appropriate, and to never use it when it isn't appropriate. Also, people can set their I/O to be by default UTF-8 encoded Unicode, not bytes.

        In other words: regardless of platform, use binmode() on binary data, like for example images.

        Anyway, I think if I use Digest::SHA::addfile() in the future I'll RTFSC first :)