Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

Serialization Format that can be read back in chunks?

by saintmike (Vicar)
on Mar 29, 2007 at 16:25 UTC ( #607307=perlquestion: print w/replies, xml ) Need Help??

saintmike has asked for the wisdom of the Perl Monks concerning the following question:

I have structured data that I'd like to write out in some kind of plain text format. Usually, I use YAML for this kind of task, but in this particular case the overall size of the data prohibits writing it out or reading it in in one piece.

Looking at the YAML documentation, I can't find a way to write out or read in the data in chunks. Or is there a way?

Alternatively, what other serialization formats could be suitable for this task?

  • Comment on Serialization Format that can be read back in chunks?

Replies are listed 'Best First'.
Re: Serialization Format that can be read back in chunks?
by jettero (Monsignor) on Mar 29, 2007 at 16:34 UTC

    Depending on your goals, you might like the look of Storable combined with DB_File or even better DBM::Deep. DBM::Deep is wow. Since you specifically mention YAML and "text", I suspect this isn't quite the same direction you're tyring to go, but it's worth mentioning just in case.

    My meaning is that when the data gets really big, a human readable layout may not be the best choice. The alternative is to break up the data and use the file-system as your "database." But I don't know anything about the problem you're trying to solve.

    -Paul

Re: Serialization Format that can be read back in chunks?
by zentara (Archbishop) on Mar 29, 2007 at 20:22 UTC
    Can't base64 (or even Math::Base85 ) do chunks?

    Untested, and I don't remember what the 57 * 60 is all about. :-) Probably something to do with ideal chunk size for base64.

    #!/usr/bin/perl use strict; use MIME::Base64 qw( encode_base64 ); #encode open INFILE, '<', $ARGV[0]; binmode INFILE; open OUTFILE, '>', $ARGV[1]; my $buf; while ( read( INFILE, $buf, 60 * 57 ) ) { print OUTFILE encode_base64($buf); } close OUTFILE; close INFILE; ################################################### #decode_base64.pl: #!/usr/bin/perl use strict; use MIME::Base64 qw( decode_base64 ); open INFILE, '<', $ARGV[0]; open OUTFILE, '>', $ARGV[1]; binmode OUTFILE; my $buf; while ( $buf = <INFILE> ) { print OUTFILE decode_base64($buf); } close OUTFILE; close INFILE;

    I'm not really a human, but I play one on earth. Cogito ergo sum a bum
      57 * 4 / 3 = 76

      the max length for a base64 coded line required by rfc2045 for MIME encoded data (mails).

      No idea if the 60 has any hard technical background though.

      Anyway, I don't know if such a character level encoding is the solution to Mike's problem. After all, this is completely unaware of the structure of the data encoded.

      Serializer/Formats as Data::Dumper, XML, YAML generate output which can be described by contextfree languages, which probably is no coincidence. Resorting to the pumping lemma the maximal distance between the start/end braces/tags/indentation-changes corresponding to the start/end of an embedded serialized substructure is not limited, which sounds quite incompatible to "chunk" (i.e. fixed-length) to me.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://607307]
Approved by jettero
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others contemplating the Monastery: (1)
As of 2021-05-18 04:35 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Perl 7 will be out ...





    Results (173 votes). Check out past polls.

    Notices?