http://www.perlmonks.org?node_id=913750

packetstormer has asked for the wisdom of the Perl Monks concerning the following question:

Hello,
Any pointers on how one might get the sum of the binary values of a string? e.g
$string = "This is a string to change to binary";
I need to figure out how to get the binary value for each character in this string, including spaces, then get the sum of them all. Its for a checksum part of an app I am writing but I can't figure this at all. Any suggestions on modules I might use?

Replies are listed 'Best First'.
Re: Binary value of string
by BrowserUk (Patriarch) on Jul 11, 2011 at 19:08 UTC

    $sum = 0; $sum += $_ for unpack 'C*', "This is a string to change to binary"; print $sum;; 3325

    Or

    print unpack '%32C*', "This is a string to change to binary";; 3325

    But the result will be modulo 2**32 (assuming it ever gets that high; which won;t happen until your strings are at least 16 million characters long.

    If that was a concern, on a perl with 64-bit ints you can do:

    print unpack '%64C*', "This is a string to change to binary";; 3325

    Which will handle strings to at least 72057594037927936 bytes, which should be enough to be going on with :)


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

      I like seeing answers that use both "ord" and "unpack". It brings up a question, is it a binary buffer or a buffer of text? The two methods can give different answers with Unicode. Unpack with "C" will extract values byte by byte, ord always works character by character- which could be multi-byte characters.

      If you use "W" with unpack, then it will behave the same as ord.

        If you use "W" with unpack, then it will behave the same as ord.

        Is there any point in checksumming using unicode ordinals?

        Sum-the-bytes checksums are pretty useless -- you can perform any transpositions, shuffle or reverse the entire string and detect nothing -- that's why CRC's and Adler etc. were invented.

        The only (scant) merit of sum-the-bytes is that it is very fast. What would be achieved by slowing that to a crawl by forcing it to pick its way through the technical abortion that is multi-byte character encodings? You certainly aren't going to gain any greater guarantee of integrity.

        My gut feel is that as the are so many different "unicode standard" encodings out there in the wild, the chances of getting false positives from undetected transmission errors using sum-the-ordinals values, is far higher then using sum-the-bytes values.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Binary value of string
by zek152 (Pilgrim) on Jul 11, 2011 at 19:05 UTC

    I would advise you to look at a standard hashing algorithm (like sha1,sha2 or md5)

    If you still want to do it yourself:

    $string = "This is a string to split and sum"; @chars = split(//,$string); $sum = 0; for(@chars) { #ord returns the numeric value of the char printf"char = %s:bin = %8b:dec = %d\n",$_,ord,ord; $sum+=ord; } print "sum = $sum\n"; #OUTPUT #char = T:bin = 1010100:dec = 84 #char = h:bin = 1101000:dec = 104 #char = i:bin = 1101001:dec = 105 #char = s:bin = 1110011:dec = 115 #char = :bin = 100000:dec = 32 #char = i:bin = 1101001:dec = 105 #char = s:bin = 1110011:dec = 115 #char = :bin = 100000:dec = 32 #char = a:bin = 1100001:dec = 97 #char = :bin = 100000:dec = 32 #char = s:bin = 1110011:dec = 115 #char = t:bin = 1110100:dec = 116 #char = r:bin = 1110010:dec = 114 #char = i:bin = 1101001:dec = 105 #char = n:bin = 1101110:dec = 110 #char = g:bin = 1100111:dec = 103 #char = :bin = 100000:dec = 32 #char = t:bin = 1110100:dec = 116 #char = o:bin = 1101111:dec = 111 #char = :bin = 100000:dec = 32 #char = s:bin = 1110011:dec = 115 #char = p:bin = 1110000:dec = 112 #char = l:bin = 1101100:dec = 108 #char = i:bin = 1101001:dec = 105 #char = t:bin = 1110100:dec = 116 #char = :bin = 100000:dec = 32 #char = a:bin = 1100001:dec = 97 #char = n:bin = 1101110:dec = 110 #char = d:bin = 1100100:dec = 100 #char = :bin = 100000:dec = 32 #char = s:bin = 1110011:dec = 115 #char = u:bin = 1110101:dec = 117 #char = m:bin = 1101101:dec = 109 #sum = 3043
Re: Binary value of string
by bluescreen (Friar) on Jul 11, 2011 at 18:56 UTC

    untested

    my $string = "some string"; my $sum = 0; $sum += ord($_) for( split('',$string) );
Re: Binary value of string
by Gulliver (Monk) on Jul 11, 2011 at 22:47 UTC

    Why are you writing your own checksum routine?

    use String::CRC32; print crc32("This is a string to change to binary"), "\n"; print crc32("Thsi is a string to change to binary"), "\n"; print crc32("This is a string to change ot binary"), "\n"; Output: 2182425632 3373251475 2370952689
Re: Binary value of string
by elTriberium (Friar) on Jul 11, 2011 at 20:14 UTC

    I also recommend using an existing hashing algorithm. Just using the binary value of a character and then calculating the sum won't detect any character transposition. The values for the following 3 strings will all be the same:

  • "A String"
  • "String A"
  • "gnirtS A"

      Examples using actual words:

      • loaf and foal
      • tort and trot
      • ant and tan
      • tame and mate
      • time and mite
      • tome and mote
      • ...