Beefy Boxes and Bandwidth Generously Provided by pair Networks Joe
more useful options
 
PerlMonks  

Best technique to code/decode binary data for inter-machine communication?

by flexvault (Vicar)
on Aug 15, 2012 at 17:34 UTC ( #987602=perlquestion: print w/ replies, xml ) Need Help??
flexvault has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks,

I'm trying to find the best way to code/decode binary data to be transferred between client machines and server machine(s). Different operating systems change or delete characters in the stream. Pack/unpack is better than any regex I could come up with, but it doubles the size of the data transferred. MIME::Base64 is better than pack/unpack. The 'Simple Copy' step is obviously the fastest possible but can't be used for inter-machine communication.

Is there something better than MIME::Base64?

This script shows the best of what I have tested:

use strict; use Benchmark qw(:all); use MIME::Base64; our $src = ""; for ( 0..255 ) { $src .= chr( $_ ); } my $count = 0; print "case1: pack/unpack -- ", &case1, " size, $count, \n"; print "case2: MIME::Base64 -- ", &case2, " size, $count, \n"; print "case3: Simple Copy -- ", &case3, " size, $count, \n"; timethese ($count, { case1 => sub {&case1}, case2 => sub {&case2}, case3 => sub {&case3}, }, ); cmpthese ($count, { case1 => sub {&case1}, case2 => sub {&case2}, case3 => sub {&case3}, }, ); sub case1 { # print "case1\n"; our $src; my $result = unpack("H*", $src ); my $back = pack("H*", $result ); if ( $src ne $back ) { print "1 ne\n"; exit; } return length( $result ); } sub case2 { # print "case2\n"; our $src; my $result = encode_base64($src,""); my $back = decode_base64( $result ); if ( $src ne $back ) { print "2 ne\n"; exit; } return length( $result ); } sub case3 { # print "case3\n"; our $src; # my $result = $src; my $back = $src; if ( $src ne $back ) { print "3 ne\n"; exit; } return length( $back ); } __END__ email7:/usr/local/apache/prefork/cgi-bin/t# pyrperl Hex_vs_B64.plx case1: pack/unpack -- 512 size, 0, case2: MIME::Base64 -- 344 size, 0, case3: Simple Copy -- 256 size, 0, Benchmark: running case1, case2, case3 for at least 3 CPU seconds... case1: 4 wallclock secs ( 3.14 usr + 0.00 sys = 3.14 CPU) @ 14 +2625.48/s (n=447844) case2: 3 wallclock secs ( 3.11 usr + 0.00 sys = 3.11 CPU) @ 20 +4192.93/s (n=635040) case3: 4 wallclock secs ( 3.18 usr + 0.00 sys = 3.18 CPU) @ 11 +99358.18/s (n=3813959) Rate case1 case2 case3 case1 141559/s -- -31% -88% case2 205511/s 45% -- -83% case3 1203509/s 750% 486% --

Thank you

"Well done is better than well said." - Benjamin Franklin

Comment on Best technique to code/decode binary data for inter-machine communication?
Download Code
Re: Best technique to code/decode binary data for inter-machine communication?
by Anonymous Monk on Aug 15, 2012 at 18:27 UTC
    I would use a well-known protocol such as SOAP, using the existing CPAN Perl modules that are built to do that.   All of the technical issues have been taken care of, and who really cares exactly how much data-size it takes.

      And the benchmark is ?

      "Well done is better than well said." - Benjamin Franklin

      I would use a well-known protocol such as SOAP, using the existing CPAN Perl modules that are built to do that. All of the technical issues have been taken care of, and who really cares exactly how much data-size it takes.

      LOL

      "who really cares exactly how much data-size it takes"

      Lots of people!

      Data size and protocol overheads are very important things to monitor in my line of work. Our customers have massive amounts of equipment communicating over a county-, territory- or even state-wide 100Mbps* private network. The occasional software and configuration push cannot impede the normal traffic or people might die.

      * typically, some have faster networks
Re: Best technique to code/decode binary data for inter-machine communication?
by zentara (Archbishop) on Aug 15, 2012 at 18:43 UTC
    Is there something better than MIME::Base64?

    Since we are just talking about making printable characters for transmission, you would want to shorten the length of the string, so how about using Math::BaseCnv to go from Base64 to Base128? Or use compression on the file, before doing the base conversion, with something like

    #!/usr/bin/perl use Compress::Zlib; use MIME::Base64; my $str = "Hello World! " x 3; my $gzip = Compress::Zlib::memGzip( $str ); my $hex = unpack 'H*', $gzip; my $base64 = encode_base64('Aladdin:open sesame'); my $str_len = length($str); my $gzip_len = length($gzip); my $hex_len = length($hex); my $base64_len = length($base64); # make binary printable ;-) $gzip = '#' x $gzip_len; printf "%3d: %s\n%3d: %s\n%3d: %s\n%3d: %s\n", $str_len, $str, $gzip_len, $gzip, $hex_len, $hex, $base64_len, $bas +e64; __DATA__ 39: Hello World! Hello World! Hello World! 36: #################################### 72: 1f8b0800000000000003f348cdc9c95708cf2fca495154f0c0c90100b9a8ae3827 +000000 29: QWxhZGRpbjpvcGVuIHNlc2FtZQ==

    I'm not really a human, but I play one on earth.
    Old Perl Programmer Haiku ................... flash japh
Re: Best technique to code/decode binary data for inter-machine communication?
by rurban (Scribe) on Aug 15, 2012 at 18:43 UTC
    Use any cross-platform serialization library. There also exist cross-language serialization libraries.

    Those 3 are fastest and do both: JSON::XS, Data::MessagePack, protobuf-perlxs or such.

Re: Best technique to code/decode binary data for inter-machine communication?
by BrowserUk (Pope) on Aug 15, 2012 at 18:49 UTC
    Different operating systems change or delete characters in the stream.

    Why not just binmode the socket (pipe/filehandle) that you do the transfer over?

    Then you don't need to do any encoding or decoding.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

    The start of some sanity?

      Hello BrowserUk,

      That was my first try, but I had buffer problems. I may have done something wrong, so I will look at that solution. I'm glad you reminded me, since that would be the best solution, but some clients just hung. Again, I'll revisit that and let you know.

      I implemented your earlier solution using 'fork' which I'm much more familiar with, but this week I wanted to read up on 'threads', but the new Camel book removed chapter 17 on threads...what a disappointment. Are there any good 'paper books' on the subject. Old habits, I like to mark up the pages. It helps me when I go back for reference.

      Regards...Ed

      "Well done is better than well said." - Benjamin Franklin

        Were you using \n as a record separator in your protocol by any chance? Binmode would prevent conversions in transmission, but the clients would be parsing the input differently, and some would never see a "\n" since they're really looking for "\r\n".

        As SuicideJunkie suggest, you were probably trying to use line-oriented xfer functions (ie. print and readline ) on a binmoded socket.

        My recommendation would be to use pack/unpack & send/recv like this:

        $to->send( pack 'n/a*', $binData ); ... $from->recv( my $len, 2 ); $from->recv( my $binData, unpack 'n', $len );

        That's good for packets up to 64k in length. Switch to 'N' to handle up to 4GB.

        The nice thing about this is that the receiver always knows how much to ask for; and can verify that he got it (length $binData) which avoids the need for delimiters and works just as well with non-blocking sockets if you need to go that way.

        Important update: If using this method to transmit data between machines, see also the thread at Mystery! Logical explanation or just Satan's work?

        I also found that when it comes to transmitting arrays and hashes, using pack/unpack is usually more compact (and therefore faster) than using Storable, because (for example) an integer always required 4 or 8 bytes binary, but for many values it is shorter in ascii:

        use Storable qw[ freeze ];; @a = 1..100;; $packed = pack 'n/(n/a*)', @a;; print length $packed;; 394 $ice = freeze \@a;; print length $ice;; 412 @b = unpack 'n/(n/a*)', $packed;; print "@b";; 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 2 +7 28 29 30 31 32 33 34 35 ... %h = 'aaaa'..'aaaz';; $packed = pack 'n/(n/a*)', %h;; print length $packed;; 158 $ice = freeze \%h;; print length $ice;; 202 %h2 = unpack 'n/(n/a*)', $packed;; pp \%h2;; { aaaa => "aaab", aaac => "aaad", aaae => "aaaf", aaag => "aaah", aaai => "aaaj", aaak => "aaal", aaam => "aaan", aaao => "aaap", aaaq => "aaar", aaas => "aaat", aaau => "aaav", aaaw => "aaax", aaay => "aaaz", }

        It doesn't always work out smaller, but it is usually faster and platform independent.

        Of course, storable wins if your data structures can contain references to others.


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

        The start of some sanity?

Re: Best technique to code/decode binary data for inter-machine communication?
by stonecolddevin (Vicar) on Aug 15, 2012 at 19:03 UTC

    There's also Thrift, but I don't know how its perl bindings stand up speed-wise. JSON::XS gets my second, because it's easy and the XS parser is relatively fast.

    Three thousand years of beautiful tradition, from Moses to Sandy Koufax, you're god damn right I'm living in the fucking past

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://987602]
Front-paged by Arunbear
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chilling in the Monastery: (7)
As of 2014-04-21 05:55 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    April first is:







    Results (491 votes), past polls