Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

Re^2: Best technique to code/decode binary data for inter-machine communication?

by flexvault (Parson)
on Aug 15, 2012 at 22:26 UTC ( #987642=note: print w/ replies, xml ) Need Help??


in reply to Re: Best technique to code/decode binary data for inter-machine communication?
in thread Best technique to code/decode binary data for inter-machine communication?

Hello BrowserUk,

That was my first try, but I had buffer problems. I may have done something wrong, so I will look at that solution. I'm glad you reminded me, since that would be the best solution, but some clients just hung. Again, I'll revisit that and let you know.

I implemented your earlier solution using 'fork' which I'm much more familiar with, but this week I wanted to read up on 'threads', but the new Camel book removed chapter 17 on threads...what a disappointment. Are there any good 'paper books' on the subject. Old habits, I like to mark up the pages. It helps me when I go back for reference.

Regards...Ed

"Well done is better than well said." - Benjamin Franklin


Comment on Re^2: Best technique to code/decode binary data for inter-machine communication?
Re^3: Best technique to code/decode binary data for inter-machine communication?
by SuicideJunkie (Priest) on Aug 15, 2012 at 22:41 UTC

    Were you using \n as a record separator in your protocol by any chance? Binmode would prevent conversions in transmission, but the clients would be parsing the input differently, and some would never see a "\n" since they're really looking for "\r\n".

      some would never see a "\n" since they're really looking for "\r\n"

      No, lack or presence of a "\r" is not going to mess up line-oriented I/O on any version of Perl1. Perl on Windows has no problem reading files that lack "\r" characters. Perl on Unix has no problem reading files that contain "\r" characters (it just includes the "\r" in the returned string).

      But a common mistake with using sockets with Perl is using <$sock>, which will hang forever until a newline or end-of-file arrives. (Using print on a socket shouldn't to be a problem as sockets shouldn't default to buffered mode.)

      1 Now that ancient Mac Perl's mistake of psuedo ASCII is history. But avoiding binmode wouldn't help in that case anyway.

      - tye        

Re^3: Best technique to code/decode binary data for inter-machine communication?
by BrowserUk (Pope) on Aug 16, 2012 at 00:38 UTC

    As SuicideJunkie suggest, you were probably trying to use line-oriented xfer functions (ie. print and readline ) on a binmoded socket.

    My recommendation would be to use pack/unpack & send/recv like this:

    $to->send( pack 'n/a*', $binData ); ... $from->recv( my $len, 2 ); $from->recv( my $binData, unpack 'n', $len );

    That's good for packets up to 64k in length. Switch to 'N' to handle up to 4GB.

    The nice thing about this is that the receiver always knows how much to ask for; and can verify that he got it (length $binData) which avoids the need for delimiters and works just as well with non-blocking sockets if you need to go that way.

    Important update: If using this method to transmit data between machines, see also the thread at Mystery! Logical explanation or just Satan's work?

    I also found that when it comes to transmitting arrays and hashes, using pack/unpack is usually more compact (and therefore faster) than using Storable, because (for example) an integer always required 4 or 8 bytes binary, but for many values it is shorter in ascii:

    use Storable qw[ freeze ];; @a = 1..100;; $packed = pack 'n/(n/a*)', @a;; print length $packed;; 394 $ice = freeze \@a;; print length $ice;; 412 @b = unpack 'n/(n/a*)', $packed;; print "@b";; 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 2 +7 28 29 30 31 32 33 34 35 ... %h = 'aaaa'..'aaaz';; $packed = pack 'n/(n/a*)', %h;; print length $packed;; 158 $ice = freeze \%h;; print length $ice;; 202 %h2 = unpack 'n/(n/a*)', $packed;; pp \%h2;; { aaaa => "aaab", aaac => "aaad", aaae => "aaaf", aaag => "aaah", aaai => "aaaj", aaak => "aaal", aaam => "aaan", aaao => "aaap", aaaq => "aaar", aaas => "aaat", aaau => "aaav", aaaw => "aaax", aaay => "aaaz", }

    It doesn't always work out smaller, but it is usually faster and platform independent.

    Of course, storable wins if your data structures can contain references to others.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

    The start of some sanity?

      I have had some interesting experiences with Storable, in the form of data which, once frozen, could not be thawed!   This was on an AS/400, and it was very data-specific, and I do not know if it was a momentary bug in whatever-it-was version of the CPAN module.   But as it was, I had to quickly scramble and store the data in the database in a different format.   (Fortunately, this was an SQLite file that didn’t have to be shared with anyone, but the occurrence of the problem surprised me greatly, nonetheless.)

      BrowserUk,

      First I'm interested in the code sample you gave:

      $to->send( pack 'n/a*', $binData );
      Currently I write that as:
      $binData = pack('N',length( $data ) ) . $data; $to->send( $binData );
      Is your code a shorthand for the above?

      Second, as you and others have pointed out, I did not use 'binmode' after opening the socket. If I were to add the following:

      binmode Socket, ":raw";
      To both the client and server code, would I be in 'binary' mode on windows, *nix, etc. or would I need to have different client code for each. Reading the latest 'binmode' documentation, it sounds like the function would be ignored on some systems and then used where binary and text definitions differ.

      Third, 'Storable' does not produce 'network neutral' results, so can't be used in this case.

      Fourth, if someone passes a ':utf8' key/value pair to my application and I store the variables in an external file as ":raw", will they be able to use the data as utf8 when they receive the key/value pair back. Until I read the 'binmode' documentation, I didn't think of that possibility!

      Thank you

      "Well done is better than well said." - Benjamin Franklin

        Is your code a shorthand for the above?

        Yes, (kinda:), but more flexible and quicker.

        THe template: n/a* says: pack as many arbitrary binary bytes as are contained in the argument, counting them as you go, and the prepend that data with that count as a network-order unsigned short. C/a* would pack the count as a single byte; N/a* as a network-order unsigned long; and so on.

        The really powerful template is N/(n/a*)* I use for arrays and hashes. It says: pack each input argument as bytes, each prefixed with its length as a network-order ushort; and the prefix the whole result with a single network order ulong that counts all the bytes and all the counts is the count of the fields packed.

        If I were to add the following: binmode Socket, ":raw"; To both the client and server code, would I be in 'binary' mode on windows, *nix, etc. or would I need to have different client code for each.

        On 'nix it will do nothing; on Windows it will turn of crlf modifications (+ prevent the oft-forgotten ^Z == EOF).

        My understanding is that if you use just binmode SOCKET; on all platforms; no translation will be done anywhere and you'll recv exactly what you send.

        When ':raw' first came around, it disabled all PerlIO layers; then they changed it for no good reason and without documentation. Last time I investigated it (on windows only!), it still removed :crlf, but didn't remove all layers. To my knowledge there is no explanation available of what layers get left behind, or why?

        Third, 'Storable' does not produce 'network neutral' results, so can't be used in this case.

        Storable does have nfreeze (network order freeze) which is defined as a "portable format"; though I've never tried using it between 32/64-bit platforms.

        That said, there are many horror stories of people being bitten by Storable; though at least half of them can be traced back to misunderstanding or incompetence.

        That said, having 'discovered' the pack 'N/(n/a*)*' method of packing simple arrays and hashes, I would use that in preference to Storable for non-nested hashes and arrays.

        Fourth, if someone passes a ':utf8' key/value pair to my application and I store the variables in an external file as ":raw", will they be able to use the data as utf8 when they receive the key/value pair back.

        Until you apply some form of encode/decoding operation to a file or data stream, anything you read is just a bunch of bytes.

        If you read bytes and transmit bytes, the receiver gets the data in the same state as if it had read bytes from the original source. If those bytes constitute data encoded in some form you will need to decode it before using it -- but it doesn't matter where (which end of the connection) that decoding happens -- so long as it is done only once and correctly.

        Of course, the definition of 'correct' requires thought. If you transmit utf16le to a big-endian machine, then that machine will need to decode it as 'utf16le' (not just 'utf16' which locally might default to 'utf16be').

        But "The Unicode Problem" -- how the f*** do you know which of the many Unicode standards was used to encode the data??? -- exists wherever you do the decoding. If the receiving machine had read the same bytes from a file, it still has to either "know" (or guess) which of the Unicode Standards was used to encode the data, because the file could have come from anywhere. (eg. the internet).

        Unicode is a f*** up! And will remain that way until they finally require that each of the various binary formats that are encompassed by the Unicode (non)Standard, prefix all encoded data with something that identifies the encoding.

        From your perspective; if you will be transmitting (say) hashes built from input that has previously been decoded, then you will need to understand the Perl Unicode handling tools. I wish I could point you to a definitive reference, but no such animal seems to exist yet.


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

        The start of some sanity?

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://987642]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others having an uproarious good time at the Monastery: (9)
As of 2014-11-23 15:09 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My preferred Perl binaries come from:














    Results (133 votes), past polls