Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

Re^5: Digest::SHA gives different values for unix/windows

by zork42 (Monk)
on Jun 13, 2013 at 04:50 UTC ( #1038670=note: print w/ replies, xml ) Need Help??


in reply to Re^4: Digest::SHA gives different values for unix/windows
in thread Digest::SHA gives different values for unix/windows

Hi this is my first post.
Hello Monks!

Given that some OSes treat binary and text files differently (the latter messing around with line endings as you said in your first post), I think you really need a $mode argument to the Digest::SHA::addfile($filename [, $mode]) method.
Otherwise you'd have to do something like this to generate the hash of a file in binary mode:

open the file set binmode while (not EOF) { read N-bytes of the file into a buffer # don't want to read the +whole file into memory if it's a big file Digest::SHA::add($buffer) } close file
It's easier to just call Digest::SHA::addfile($filename "b")
It might cause fewer problems if Digest::SHA::addfile() defaulted to binary mode.


Comment on Re^5: Digest::SHA gives different values for unix/windows
Download Code
Re^6: Digest::SHA gives different values for unix/windows
by Jim (Curate) on Jun 13, 2013 at 19:22 UTC
    Hi this is my first post. Hello Monks!

    Hello! And welcome. Nice first post.

    Given that some OSes treat binary and text files differently (the latter messing around with line endings as you said in your first post), I think you really need a $mode argument to the Digest::SHA::addfile($filename [, $mode]) method.

    It's intended as a convenience feature much like the ASCII and IMAGE (BINARY) modes of FTP. But I would argue it's a mis-feature—or at least a misplaced feature. It's akin to the line ending translation feature of FTP, which proved over time to do more harm than good. It was a fine feature in the early days of the Internet when FTP was used mostly by savvy technologists to bandy about lots of source code (plain text). But later, it just caused endless trouble for na´ve Internet users who mostly used FTP to transfer large ZIP files and such. ASCII mode became the wrong default. Much bandwidth was wasted transferring large binary files multiple times because they all-too-often got corrupted the first time.

    It might cause fewer problems if Digest::SHA::addfile() defaulted to binary mode.

    I think it essentially does. From my reading of the confusing documentation, the right thing to do in the general case (i.e., the case where you don't want line ending translation to happen) is not to use the mode argument of Digest::SHA::addfile() at all. It states:

    "By default, $filename is simply opened and read; no special modes or I/O disciplines are used."
      Hi Jim, thanks for the welcome!
      "By default, $filename is simply opened and read; no special modes or I/O disciplines are used."
      I agree this is not very clear.
      My interpretation was that Digest::SHA::addfile() will default to text mode, at least it would under Windows. With Windows you must explicitly select binmode or open filehandles default to text mode.

      From perlfunc.html:
      binmode FILEHANDLE, LAYER
      binmode FILEHANDLE

      Arranges for FILEHANDLE to be read or written in "binary" or "text" mode on systems where the run-time libraries distinguish between binary and text files. If FILEHANDLE is an expression, the value is taken as the name of the filehandle. Returns true on success, otherwise it returns undef and sets $! (errno).

      On some systems (in general, DOS and Windows-based systems) binmode() is necessary when you're not working with a text file. For the sake of portability it is a good idea to always use it when appropriate, and to never use it when it isn't appropriate. Also, people can set their I/O to be by default UTF-8 encoded Unicode, not bytes.

      In other words: regardless of platform, use binmode() on binary data, like for example images.

      Anyway, I think if I use Digest::SHA::addfile() in the future I'll RTFSC first :)

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1038670]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others taking refuge in the Monastery: (10)
As of 2014-07-28 17:41 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (204 votes), past polls