Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

Serialise to binary?

by sectokia (Sexton)
on Oct 25, 2015 at 22:33 UTC ( #1145896=perlquestion: print w/replies, xml ) Need Help??

sectokia has asked for the wisdom of the Perl Monks concerning the following question:

Hi wise monks,

Are there any cpan modules that serialise data structures in binary and won't have the issues associated with storable for cross platform compatibility?

I was using storable, but since it wasn't portable and gave issues when switching systems, I have moved on to JSON / Data::Dumper.

However neither of these store binary data as binary, which is costing me space. My binary data is already compressed, so when I compress JSON or Data::Dumper output, it still ends up substantially bigger than storable.

Replies are listed 'Best First'.
Re: Serialise to binary?
by davido (Cardinal) on Oct 26, 2015 at 01:26 UTC

    BSON implements BSON - Binary JSON, which is "...a bin­ary-en­coded seri­al­iz­a­tion of JSON-like doc­u­ments. Like JSON, BSON sup­ports the em­bed­ding of doc­u­ments and ar­rays with­in oth­er doc­u­ments and ar­rays. BSON also con­tains ex­ten­sions that al­low rep­res­ent­a­tion of data types that are not part of the JSON spec. For ex­ample, BSON has a Date type and a BinData type."

    Sounds like that could be a decent fit, particularly since it is probably more portable than Storable.


    Dave

      I tried BSON, I found that it would throw a lot of warnings on basic structures, especially it seems to miss identify some scalars as floats and attempts to pack them as floats causing "argument isn't numeric in pack". I also found floats having widly inconsistent values when encoded then decoded.

      Because its so heavily tied to MongoDB, they don't see to really care about having it being able to encode arbitrary data structures (evidence by the fact that you have to pass a hash ref, no array ref allowed). They just want to decode their own binary data as they use it in Mongo and re-encode structures setup the same way.

      So I wouldn't recommend it.

        they don't see to really care about having it being able to encode arbitrary data structures

        I think that's an unkind assumption about intent. (N.B. I am the current maintainer.)

        But like JSON, BSON is document-oriented, so is not designed to store raw arrays or scalars the way Storable or Sereal will. So in that sense, it might not be the right choice for your needs.

        Beyond that however, the goal of BSON is to handle whatever you can throw at it as best as possible given the ambiguities mapping data between a dynamic, largely typeless language like Perl and a typed data format like BSON. Knowing that some Perl scalar is binary data and not an arbitrary string is impossible without some hints from the programmer.

        The MongoDB::BSON implementation is in XS and has been part of the MongoDB driver distribution. We hope to eventually split it out so that it can be used independently where warranted.

        The BSON.pm implementation is pure Perl and was originally developed outside MongoDB (but has since been adopted by the company). There are still some areas where it is not yet as good as MongoDB::BSON.

        Even if BSON is not right for this particular problem, if anyone experiences bugs using either implementation, I encourage you to report them or at least email us about them so we can fix them.

        -xdg

        Code written by xdg and posted on PerlMonks is public domain. It is provided as is with no warranties, express or implied, of any kind. Posted code may not have been tested. Use of posted code is at your own risk.
Re: Serialise to binary?
by Corion (Pope) on Oct 26, 2015 at 09:12 UTC

    Depending on your data structure, you might have more luck with Sereal.

Re: Serialise to binary?
by RichardK (Parson) on Oct 26, 2015 at 00:30 UTC

    You might have to do it yourself using pack, but it does give you 16 & 32 bit ints in both big-endian & little-endian.

    #from the docs n An unsigned short (16-bit) in "network" (big-endian) order. N An unsigned long (32-bit) in "network" (big-endian) order. v An unsigned short (16-bit) in "VAX" (little-endian) order. V An unsigned long (32-bit) in "VAX" (little-endian) order.
Re: Serialise to binary?
by Laurent_R (Canon) on Oct 26, 2015 at 00:08 UTC
    Hmm, I am afraid that if you want a binary format that is cross-platform compatible, you'll have to define it yourself, I do not think there is a standard for such a format.

    Well, of course, you may find an existing binary format for data exchange between a given platform pair, or even for a few platforms, but I do not think there is any standard one that is really cross-platform in the wider sense. And there are plenty of quite compelling reasons for that, one of them being that there is not one binary format, but several.

    Having said that, some network protocols do define some low level binary formats that you might want to use. But I am not convinced that such low level formats fit your functional/business needs. You did not say enough on your requirements for giving a definite answer (which I would probably not be able to give anyway, I haven't work in this area for about 15 years and I don't remember enough about these things).

Re: Serialise to binary?
by Your Mother (Bishop) on Oct 25, 2015 at 23:39 UTC
    My binary data is already compressed, so when I compress JSON or Data::Dumper output, it still ends up substantially bigger than storable.

    This is highly unlikely. If you serialize without whitespace and compress and it is substantially bigger than storable…? This is not my forté but post your code and I’m sure someone can show you where it’s gone sideways.

      An example is when there are a huge number of scalars having random contents. Here compressed storable has 33% over head, where as compressed json has 70%+ overhead.
      use strict; use warnings; use Storable; use IO::Compress::Gzip qw(gzip); use JSON::XS; my (@data,$serial,$gzserial,$json,$gzjson,$i); for($i=0;$i<100000;$i++) { push @data, chr(int(rand(256)))} $serial = Storable::nfreeze(\@data); $json = encode_json(\@data); gzip \$json => \$gzjson; gzip \$serial => \$gzserial; print scalar(@data)."\n"; print length($serial)."\n"; print length($gzserial)."\n"; print length($json)."\n"; print length($gzjson)."\n";

        Oh, nice! I was about to argue that one character scalars making quotation marks more than 60% of the data rigged the test in favor of Storable but I upped the "word" size and the difference remains at about 30% in favor of Storable. Sidebar: on my box at least, Storable sees *negative* change from zipping: i.e., the zipped Storable is slightly bigger than the raw nstore .

Re: Serialise to binary?
by BrowserUk (Pope) on Oct 26, 2015 at 15:18 UTC
    My binary data is already compressed, so when I compress JSON or Data::Dumper output, it still ends up substantially bigger than storable.

    How big is this data?

    How often are you interchanging this data?


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority". I knew I was on the right track :)
    In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Serialise to binary?
by sundialsvc4 (Abbot) on Oct 26, 2015 at 00:48 UTC

    I, too, no longer use storable, after I found that it was not able to thaw data, 100% of the time, that had been frozen on the same system.   Bleah.

    If your data is “already compressed,” it is certainly possible that compression algorithms applied to JSON-encoded data will not be able to compress the text to the extent that will usually be the case.   However, I cordially suggest that the best thing to do is to accept this.   Your number-one concern is that “every platform will be able to decode and process these data, 100% of the time.”   If file-size is the price that you pay with these data (and assuming that Laurent doesn’t very quickly find a bug in your code, once you post it as he requested), then, I recommend, c’est la guerre.   JSON is known to be portable, it is well-supported in many languages, and it is standard, as is Zip-compression.

    Then, I would agree with Richard:   write your own format, using pack, and explicitly document it.   If the receiving system does not support the same word-size or byte-order (and does not have access to Perl!), then the monkey would be on their back to correctly decode your file according to the specifications that you have given them.   It is very unlikely that these files will be very compressible, as noted above, and now the receiving party might be obliged to write a possibly-tricky custom program.   (But if they have a compatible Perl, they can more-or-less use or adapt your scripts, now using unpack, which accepts the same format-strings.)

    But, of the two, I would still stick with JSON absent other extenuating circumstance, because the burden on the receiving system (programmer) is far less.   (It pays to be “return the favor” to such people whenever possible.)

      Did you by any chance use references as hash keys? Things like that will never work anywhere under any freeze/thaw system.

      Did you report the problem and provide a reproducible example?

      I'd like to know what you did to break it. I use storable because everything else doesn't support circular references and/or takes 10 minutes to serialize instead of 1-2 seconds, and it works great between 64bit linux and 32bit windows boxes

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1145896]
Approved by Laurent_R
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others scrutinizing the Monastery: (6)
As of 2019-07-18 07:10 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?