Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change

Re^6: Digest::SHA gives different values for unix/windows

by Jim (Curate)
on Jun 13, 2013 at 19:22 UTC ( #1038825=note: print w/replies, xml ) Need Help??

in reply to Re^5: Digest::SHA gives different values for unix/windows
in thread Digest::SHA gives different values for unix/windows

Hi this is my first post. Hello Monks!

Hello! And welcome. Nice first post.

Given that some OSes treat binary and text files differently (the latter messing around with line endings as you said in your first post), I think you really need a $mode argument to the Digest::SHA::addfile($filename [, $mode]) method.

It's intended as a convenience feature much like the ASCII and IMAGE (BINARY) modes of FTP. But I would argue it's a mis-feature—or at least a misplaced feature. It's akin to the line ending translation feature of FTP, which proved over time to do more harm than good. It was a fine feature in the early days of the Internet when FTP was used mostly by savvy technologists to bandy about lots of source code (plain text). But later, it just caused endless trouble for na´ve Internet users who mostly used FTP to transfer large ZIP files and such. ASCII mode became the wrong default. Much bandwidth was wasted transferring large binary files multiple times because they all-too-often got corrupted the first time.

It might cause fewer problems if Digest::SHA::addfile() defaulted to binary mode.

I think it essentially does. From my reading of the confusing documentation, the right thing to do in the general case (i.e., the case where you don't want line ending translation to happen) is not to use the mode argument of Digest::SHA::addfile() at all. It states:

"By default, $filename is simply opened and read; no special modes or I/O disciplines are used."

Replies are listed 'Best First'.
Re^7: Digest::SHA gives different values for unix/windows
by zork42 (Monk) on Jun 14, 2013 at 07:07 UTC
    Hi Jim, thanks for the welcome!
    "By default, $filename is simply opened and read; no special modes or I/O disciplines are used."
    I agree this is not very clear.
    My interpretation was that Digest::SHA::addfile() will default to text mode, at least it would under Windows. With Windows you must explicitly select binmode or open filehandles default to text mode.

    From perlfunc.html:
    binmode FILEHANDLE

    Arranges for FILEHANDLE to be read or written in "binary" or "text" mode on systems where the run-time libraries distinguish between binary and text files. If FILEHANDLE is an expression, the value is taken as the name of the filehandle. Returns true on success, otherwise it returns undef and sets $! (errno).

    On some systems (in general, DOS and Windows-based systems) binmode() is necessary when you're not working with a text file. For the sake of portability it is a good idea to always use it when appropriate, and to never use it when it isn't appropriate. Also, people can set their I/O to be by default UTF-8 encoded Unicode, not bytes.

    In other words: regardless of platform, use binmode() on binary data, like for example images.

    Anyway, I think if I use Digest::SHA::addfile() in the future I'll RTFSC first :)

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1038825]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (10)
As of 2018-10-15 22:07 GMT
Find Nodes?
    Voting Booth?
    When I need money for a bigger acquisition, I usually ...

    Results (82 votes). Check out past polls.