http://www.perlmonks.org?node_id=1038369

rmahin has asked for the wisdom of the Perl Monks concerning the following question:

Hey monks,

Hopefully this is an easy one. I have a program that transfers files, and checks SHA values to ensure success. However I am now trying to transfer binary files, like tar/zip files, and the sha values are not matching up when i go from windows to unix, and unix to windows. This occurs regardless of if I use standard ftp, or my code to transfer the files.

Here is the code I'm using
use strict; use warnings; use Digest::SHA; my $file = shift; print calculateSHA($file); print "\n"; sub calculateSHA{ my $data = shift; my $val = 0; eval{ my $sha1 = Digest::SHA->new(256); #supposed to be portable and os independant? not working. $sha1->addfile($data, "p"); $val = $sha1->hexdigest; }; if($@){ print "An error occurred calculating the SHA-Value. Cannot ver +ify integrity of file\n$@\n"; } return $val; }
Any help you can offer is most appreciated!

RESOLVED: I'm really not sure what did it, but eventually using the $sha1->addfile($data, "p"); made everything work...A little confused as to why it wasn't when I first tried, must have missed something...but anyways, everything is working now, thank you all for the help

Replies are listed 'Best First'.
Re: Digest::SHA gives different values for unix/windows
by Jim (Curate) on Jun 12, 2013 at 01:40 UTC
    #supposed to be portable and os independant? not working.

    I suspect it is working just fine.

    The default FTP translation mode is ASCII, which does line ending localization (LF to CRLF, CRLF to LF, etc.). You don't want this for binary files such as compressed archives. You want binary mode instead. See File_Transfer_Protocol:  Communication and data transfer for more details.

    It's a good thing you're computing message digests on both of sides of the FTP file transfer. Your strategy worked!

      Well that would explain why they are different using ftp. However, in the perl scripts I wrote, I explicitly set the the socket to binmode on both client and server side. Is there something else I should be doing in addition to this? This is the part I'm particularly interested in as I only used ftp to see if something in my code was messing it up.

      Thanks for the response!

        Have you used another utility to compute the SHA-256 or MD5 digests of the binary files on both ends of the file transfer? It's not clear to me yet if your problem is that the files are not the same, and so something's wrong with the file transfer, or that the files are the same, and so something's wrong with your digest computation. Unequivocally prove the difference or sameness of the files first, then you'll know with certainty which of two different problems you must solve.

        (I like md5deep for Microsoft Windows.)

      On another note, tried using binary mode transfer in FTP, and the values were a match. This was using winscp. I still have yet to replicate this in perl. Should setting binmode on the socket, and the file I'm reading in be enough?

        What protocol are you using in your Perl script to transfer the files via the Internet? I presume from what you've written that you're not using Net::FTP.

        Update:  Does this thread help?

Re: Digest::SHA gives different values for unix/windows
by syphilis (Archbishop) on Jun 12, 2013 at 02:59 UTC
    $sha1->addfile($data, "p");

    I'm not sure that you'd want to use "p" with binary files. (I haven't checked, but I'm thinking it might be applicable only for text files.)
    Try specifying "b" - and if that fixes things then you've found the problem.
    Otherwise, it's probably as Jim said.

    Cheers,
    Rob
      Yeah I tried that too. Even tried a "pb" (not even sure if you can do that or not, the doc is unfortunately not very specific, but figured portable binary files would have been good) and there was absolutely no difference in the resulting value using any of the modes or none at all. :/

        I saw your comment above that the problem seems to be in the transferring of the file. In addition to that, I just wanted to confirm what syphilis suggested: After transfer you should be using the "b" mode. From the Digest::SHA docs (emphasis mine):

        The "p" mode is handy since it ensures that the digest value of $filename will be the same when computed on different operating systems. It accomplishes this by internally translating all newlines in text files to UNIX format before calculating the digest. Binary files are read in raw mode with no translation whatsoever.

        The name "portable" is a bit confusing here.

Re: Digest::SHA gives different values for unix/windows
by Anonymous Monk on Aug 19, 2014 at 17:07 UTC
    I ran into a similar issue here. My *nix Perl code, shasum (*nix), and sha256deep (Win) all returned one SHA-256 sum, but the same Perl code on Windows was returning something different here. Using "b" or "p" with addfile() did not make a difference. Then it occurred to me, the file handle itself might not have been opened correctly. The sample code above doesn't give any insight onto how the file handles were created, so I can't comment on those, but I was using "<" for the mode initially. When I switched to "< :raw", Windows started agreeing with all the other sums, no other changes needed. Just needed to be explicit about reading the binary files in a raw mode instead of whatever Perl picks on that particular platform. Apparently it chooses poorly on Windows. I hope this comment helps someone else who runs into this issue, as my hunch was more helpful to me than the Googling I did.