flexvault has asked for the wisdom of the Perl Monks concerning the following question:
Dear Monks,
I'm trying to find the best way to code/decode binary data to be transferred between client machines and server machine(s). Different operating systems change or delete characters in the stream. Pack/unpack is better than any regex I could come up with, but it doubles the size of the data transferred. MIME::Base64 is better than pack/unpack. The 'Simple Copy' step is obviously the fastest possible but can't be used for inter-machine communication.
Is there something better than MIME::Base64?
This script shows the best of what I have tested:
use strict;
use Benchmark qw(:all);
use MIME::Base64;
our $src = "";
for ( 0..255 )
{ $src .= chr( $_ ); }
my $count = 0;
print "case1: pack/unpack -- ", &case1, " size, $count, \n";
print "case2: MIME::Base64 -- ", &case2, " size, $count, \n";
print "case3: Simple Copy -- ", &case3, " size, $count, \n";
timethese ($count, {
case1 => sub {&case1},
case2 => sub {&case2},
case3 => sub {&case3},
},
);
cmpthese ($count, {
case1 => sub {&case1},
case2 => sub {&case2},
case3 => sub {&case3},
},
);
sub case1
{ # print "case1\n";
our $src;
my $result = unpack("H*", $src );
my $back = pack("H*", $result );
if ( $src ne $back ) { print "1 ne\n"; exit; }
return length( $result );
}
sub case2
{ # print "case2\n";
our $src;
my $result = encode_base64($src,"");
my $back = decode_base64( $result );
if ( $src ne $back ) { print "2 ne\n"; exit; }
return length( $result );
}
sub case3
{ # print "case3\n";
our $src;
# my $result = $src;
my $back = $src;
if ( $src ne $back ) { print "3 ne\n"; exit; }
return length( $back );
}
__END__
email7:/usr/local/apache/prefork/cgi-bin/t# pyrperl Hex_vs_B64.plx
case1: pack/unpack -- 512 size, 0,
case2: MIME::Base64 -- 344 size, 0,
case3: Simple Copy -- 256 size, 0,
Benchmark: running case1, case2, case3 for at least 3 CPU seconds...
case1: 4 wallclock secs ( 3.14 usr + 0.00 sys = 3.14 CPU) @ 14
+2625.48/s (n=447844)
case2: 3 wallclock secs ( 3.11 usr + 0.00 sys = 3.11 CPU) @ 20
+4192.93/s (n=635040)
case3: 4 wallclock secs ( 3.18 usr + 0.00 sys = 3.18 CPU) @ 11
+99358.18/s (n=3813959)
Rate case1 case2 case3
case1 141559/s -- -31% -88%
case2 205511/s 45% -- -83%
case3 1203509/s 750% 486% --
Thank you
"Well done is better than well said." - Benjamin Franklin
Re: Best technique to code/decode binary data for inter-machine communication?
by BrowserUk (Patriarch) on Aug 15, 2012 at 18:49 UTC
|
Different operating systems change or delete characters in the stream.
Why not just binmode the socket (pipe/filehandle) that you do the transfer over?
Then you don't need to do any encoding or decoding.
With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
| [reply] [Watch: Dir/Any] |
|
Hello BrowserUk,
That was my first try, but I had buffer problems. I may have done something wrong, so I will look at that solution. I'm glad you reminded me, since that would be the best solution, but some clients just hung. Again, I'll revisit that and let you know.
I implemented your earlier solution using 'fork' which I'm much more familiar with, but this week I wanted to read up on 'threads', but the new Camel book removed chapter 17 on threads...what a disappointment. Are there any good 'paper books' on the subject.
Old habits, I like to mark up the pages. It helps me when I go back for reference.
Regards...Ed
"Well done is better than well said." - Benjamin Franklin
| [reply] [Watch: Dir/Any] |
|
$to->send( pack 'n/a*', $binData );
...
$from->recv( my $len, 2 );
$from->recv( my $binData, unpack 'n', $len );
That's good for packets up to 64k in length. Switch to 'N' to handle up to 4GB.
The nice thing about this is that the receiver always knows how much to ask for; and can verify that he got it (length $binData) which avoids the need for delimiters and works just as well with non-blocking sockets if you need to go that way.
I also found that when it comes to transmitting arrays and hashes, using pack/unpack is usually more compact (and therefore faster) than using Storable, because (for example) an integer always required 4 or 8 bytes binary, but for many values it is shorter in ascii: use Storable qw[ freeze ];;
@a = 1..100;;
$packed = pack 'n/(n/a*)', @a;;
print length $packed;;
394
$ice = freeze \@a;;
print length $ice;;
412
@b = unpack 'n/(n/a*)', $packed;;
print "@b";;
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 2
+7 28 29 30 31 32 33 34 35 ...
%h = 'aaaa'..'aaaz';;
$packed = pack 'n/(n/a*)', %h;;
print length $packed;;
158
$ice = freeze \%h;;
print length $ice;;
202
%h2 = unpack 'n/(n/a*)', $packed;;
pp \%h2;;
{
aaaa => "aaab",
aaac => "aaad",
aaae => "aaaf",
aaag => "aaah",
aaai => "aaaj",
aaak => "aaal",
aaam => "aaan",
aaao => "aaap",
aaaq => "aaar",
aaas => "aaat",
aaau => "aaav",
aaaw => "aaax",
aaay => "aaaz",
}
It doesn't always work out smaller, but it is usually faster and platform independent.
Of course, storable wins if your data structures can contain references to others.
With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
| [reply] [Watch: Dir/Any] [d/l] [select] |
|
|
|
|
|
Were you using \n as a record separator in your protocol by any chance? Binmode would prevent conversions in transmission, but the clients would be parsing the input differently, and some would never see a "\n" since they're really looking for "\r\n".
| [reply] [Watch: Dir/Any] |
|
Re: Best technique to code/decode binary data for inter-machine communication?
by rurban (Scribe) on Aug 15, 2012 at 18:43 UTC
|
Use any cross-platform serialization library. There also exist cross-language serialization libraries.
Those 3 are fastest and do both:
JSON::XS, Data::MessagePack, protobuf-perlxs or such.
| [reply] [Watch: Dir/Any] |
Re: Best technique to code/decode binary data for inter-machine communication?
by zentara (Archbishop) on Aug 15, 2012 at 18:43 UTC
|
Is there something better than MIME::Base64?
Since we are just talking about making printable characters for transmission, you would want to shorten the length of the string, so how about using Math::BaseCnv to go from Base64 to Base128? Or use compression on the file, before doing the base conversion, with something like
#!/usr/bin/perl
use Compress::Zlib;
use MIME::Base64;
my $str = "Hello World! " x 3;
my $gzip = Compress::Zlib::memGzip( $str );
my $hex = unpack 'H*', $gzip;
my $base64 = encode_base64('Aladdin:open sesame');
my $str_len = length($str);
my $gzip_len = length($gzip);
my $hex_len = length($hex);
my $base64_len = length($base64);
# make binary printable ;-)
$gzip = '#' x $gzip_len;
printf "%3d: %s\n%3d: %s\n%3d: %s\n%3d: %s\n",
$str_len, $str, $gzip_len, $gzip, $hex_len, $hex, $base64_len, $bas
+e64;
__DATA__
39: Hello World! Hello World! Hello World!
36: ####################################
72: 1f8b0800000000000003f348cdc9c95708cf2fca495154f0c0c90100b9a8ae3827
+000000
29: QWxhZGRpbjpvcGVuIHNlc2FtZQ==
| [reply] [Watch: Dir/Any] [d/l] |
Re: Best technique to code/decode binary data for inter-machine communication?
by stonecolddevin (Parson) on Aug 15, 2012 at 19:03 UTC
|
There's also Thrift, but I don't know how its perl bindings stand up speed-wise. JSON::XS gets my second, because it's easy and the XS parser is relatively fast.
Three thousand years of beautiful tradition, from Moses to Sandy Koufax, you're god damn right I'm living in the fucking past
| [reply] [Watch: Dir/Any] |
Re: Best technique to code/decode binary data for inter-machine communication?
by Anonymous Monk on Aug 15, 2012 at 18:27 UTC
|
I would use a well-known protocol such as SOAP, using the existing CPAN Perl modules that are built to do that. All of the technical issues have been taken care of, and who really cares exactly how much data-size it takes. | [reply] [Watch: Dir/Any] |
|
And the benchmark is ?
"Well done is better than well said." - Benjamin Franklin
| [reply] [Watch: Dir/Any] |
|
I would use a well-known protocol such as SOAP, using the existing CPAN Perl modules that are built to do that. All of the technical issues have been taken care of, and who really cares exactly how much data-size it takes. LOL
| [reply] [Watch: Dir/Any] |
|
"who really cares exactly how much data-size it takes"
Lots of people!
Data size and protocol overheads are very important things to monitor in my line of work. Our customers have massive amounts of equipment communicating over a county-, territory- or even state-wide 100Mbps* private network. The occasional software and configuration push cannot impede the normal traffic or people might die.
* typically, some have faster networks
| [reply] [Watch: Dir/Any] |
|
|