Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

Re^4: Best technique to code/decode binary data for inter-machine communication?

by flexvault (Monsignor)
on Aug 16, 2012 at 13:13 UTC ( [id://987758]=note: print w/replies, xml ) Need Help??


in reply to Re^3: Best technique to code/decode binary data for inter-machine communication?
in thread Best technique to code/decode binary data for inter-machine communication?

BrowserUk,

First I'm interested in the code sample you gave:

$to->send( pack 'n/a*', $binData );
Currently I write that as:
$binData = pack('N',length( $data ) ) . $data; $to->send( $binData );
Is your code a shorthand for the above?

Second, as you and others have pointed out, I did not use 'binmode' after opening the socket. If I were to add the following:

binmode Socket, ":raw";
To both the client and server code, would I be in 'binary' mode on windows, *nix, etc. or would I need to have different client code for each. Reading the latest 'binmode' documentation, it sounds like the function would be ignored on some systems and then used where binary and text definitions differ.

Third, 'Storable' does not produce 'network neutral' results, so can't be used in this case.

Fourth, if someone passes a ':utf8' key/value pair to my application and I store the variables in an external file as ":raw", will they be able to use the data as utf8 when they receive the key/value pair back. Until I read the 'binmode' documentation, I didn't think of that possibility!

Thank you

"Well done is better than well said." - Benjamin Franklin

Replies are listed 'Best First'.
Re^5: Best technique to code/decode binary data for inter-machine communication?
by BrowserUk (Patriarch) on Aug 16, 2012 at 14:16 UTC
    Is your code a shorthand for the above?

    Yes, (kinda:), but more flexible and quicker.

    THe template: n/a* says: pack as many arbitrary binary bytes as are contained in the argument, counting them as you go, and the prepend that data with that count as a network-order unsigned short. C/a* would pack the count as a single byte; N/a* as a network-order unsigned long; and so on.

    The really powerful template is N/(n/a*)* I use for arrays and hashes. It says: pack each input argument as bytes, each prefixed with its length as a network-order ushort; and the prefix the whole result with a single network order ulong that counts all the bytes and all the counts is the count of the fields packed.

    If I were to add the following: binmode Socket, ":raw"; To both the client and server code, would I be in 'binary' mode on windows, *nix, etc. or would I need to have different client code for each.

    On 'nix it will do nothing; on Windows it will turn of crlf modifications (+ prevent the oft-forgotten ^Z == EOF).

    My understanding is that if you use just binmode SOCKET; on all platforms; no translation will be done anywhere and you'll recv exactly what you send.

    When ':raw' first came around, it disabled all PerlIO layers; then they changed it for no good reason and without documentation. Last time I investigated it (on windows only!), it still removed :crlf, but didn't remove all layers. To my knowledge there is no explanation available of what layers get left behind, or why?

    Third, 'Storable' does not produce 'network neutral' results, so can't be used in this case.

    Storable does have nfreeze (network order freeze) which is defined as a "portable format"; though I've never tried using it between 32/64-bit platforms.

    That said, there are many horror stories of people being bitten by Storable; though at least half of them can be traced back to misunderstanding or incompetence.

    That said, having 'discovered' the pack 'N/(n/a*)*' method of packing simple arrays and hashes, I would use that in preference to Storable for non-nested hashes and arrays.

    Fourth, if someone passes a ':utf8' key/value pair to my application and I store the variables in an external file as ":raw", will they be able to use the data as utf8 when they receive the key/value pair back.

    Until you apply some form of encode/decoding operation to a file or data stream, anything you read is just a bunch of bytes.

    If you read bytes and transmit bytes, the receiver gets the data in the same state as if it had read bytes from the original source. If those bytes constitute data encoded in some form you will need to decode it before using it -- but it doesn't matter where (which end of the connection) that decoding happens -- so long as it is done only once and correctly.

    Of course, the definition of 'correct' requires thought. If you transmit utf16le to a big-endian machine, then that machine will need to decode it as 'utf16le' (not just 'utf16' which locally might default to 'utf16be').

    But "The Unicode Problem" -- how the f*** do you know which of the many Unicode standards was used to encode the data??? -- exists wherever you do the decoding. If the receiving machine had read the same bytes from a file, it still has to either "know" (or guess) which of the Unicode Standards was used to encode the data, because the file could have come from anywhere. (eg. the internet).

    Unicode is a f*** up! And will remain that way until they finally require that each of the various binary formats that are encompassed by the Unicode (non)Standard, prefix all encoded data with something that identifies the encoding.

    From your perspective; if you will be transmitting (say) hashes built from input that has previously been decoded, then you will need to understand the Perl Unicode handling tools. I wish I could point you to a definitive reference, but no such animal seems to exist yet.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

    The start of some sanity?

      BrowserUk,

      WOW, I like that shorthand format, N/(N/a*)* and N/(n/a*)* , look very flexible for future encoding/decoding uses.

        Until you apply some form of encode/decoding operation to a file or data stream, anything you read is just a bunch of bytes.
      This was my take also!
        Unicode is a f*** up!
      Agreed! I haven't looked at this much, but the spec says '22 bits', but Perl code seems to use '24 bits' for each character with the high-order bits being '00'. Whether other implementations do the same I don't know, but seems like too much room for mis-interpretation.

      Thank you

      "Well done is better than well said." - Benjamin Franklin

        I like that shorthand format, N/(N/a*)* and N/(n/a*)* , look very flexible for future encoding/decoding uses.

        See also Mystery! Logical explanation or just Satan's work?.

        the spec says '22 bits', but Perl code seems to use '24 bits' for each character with the high-order bits being '00'.

        utf-8 is a variable width encoding. Each character can require from 1 to 4(*) bytes.

        (*Or 6 bytes depending upon the wind direction and the phases of the moon(**).)

        <smaller>(**which moon is left unspecified :)</smaller>


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

        The start of some sanity?

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://987758]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others admiring the Monastery: (3)
As of 2024-04-19 19:52 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found