Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

Binary data structure to moose class.

by chrestomanci (Priest)
on Jan 02, 2013 at 15:14 UTC ( #1011290=perlquestion: print w/ replies, xml ) Need Help??
chrestomanci has asked for the wisdom of the Perl Monks concerning the following question:

Greeting wise brothers.

I am looking a Moose class reprentation of a binary file format. (The binary file is the metadata file format used by Humax Digital satelite TV recevers. Documented here: http://foxsatdisk.wikispaces.com/.hmt+file+format)

Seeing as the Moose BestPractices discorage overiding new(), the approach I am taking is to insert the raw data as a field in the class, and then have a large number of lazy builders to extract fields. My API currenlty looks like this:

my $hmt_data = new Binary::Humax::hmt_data(); # Object will be us +eless as it contains no data $hmt_data->raw_from_file($path_name); # At this point onl +y the 'rawData' field is populated my $field = $hmt_data->startTime(); # Invokes a lazy bu +ilder to extract the field.

Some of the code in the class looks like this

use DateTime; use Moose; # The raw data that all the fields are extracted from. has 'rawDataBlock' => ( is => 'rw', isa => 'Str', ); # All the fields extracted from the raw data block are lazy using buil +ders below. has 'lastPlay' => ( lazy_build => 1, is=>'rw', isa=>'Int'); has 'ChanNum' => ( lazy_build => 1, is=>'rw', isa=>'Int'); has 'startTime' => ( lazy_build => 1, is=>'rw', isa=>'DateTime') +; has 'fileName' => ( lazy_build => 1, is=>'rw', isa=>'Str'); [...] sub _build_lastPlay { my $self = shift @_; return unpack('@5 S', $self->rawDataBlock() ); } sub _build_ChanNum { my $self = shift @_; return unpack('@17 S', $self->rawDataBlock() ); } sub _build_startTime { my $self = shift @_; my $epoch = unpack('@25 N', $self->rawDataBlock() ); return DateTime->from_epoch( epoch => $epoch, time_zone => 'GMT' ) +; } sub _build_fileName { my $self = shift @_; return unpack('@33 A512', $self->rawDataBlock() ); }

My problem is that there are rather a lot of those lazy builders, and they are all similar and repetitive. Is there a way I can make all the fields share the same lazy builder, perhaps by putting the unpack information as an attribute to the field definition?

Secondly, when complete, I plan to upload the module to CPAN. Have I chosen a good name for it, or should it live in a different name space?

Comment on Binary data structure to moose class.
Select or Download Code
Re: Binary data structure to moose class.
by tobyink (Abbot) on Jan 02, 2013 at 16:07 UTC

    One of the core ideas behind Moose is that of metaprogramming. That is; don't write programs - write programs which write programs.

    For example, rather than defining our attributes the old-fashioned way, like:

    sub some_attribute { my $self = shift; $self->{some_attribute} = shift if @_; return $self->{some_attribute}; }

    We just write:

    has some_attribute => (is => 'rw');

    The has function is a "program which writes programs". It makes our sub some_attribute for us!

    So the solution is to write something that does the same sort of job as has, but has some domain-specific knowledge about grabbing raw data, unpacking it, etc. Here's an example (untested):

    { package Binary::Humax::HmtData; use DateTime; use Moose; has raw_data_block => ( is => 'rw', isa => 'Str', required => 1, ); # Shortcut function for defining attributes # sub _my_has { my ($name, %spec) = @_; my $meta = __PACKAGE__->meta; # Default attribute to 'rw' $spec{is} //= 'rw'; # Set up lazy builder if (my $unpack = delete $spec{unpack}) { $spec{lazy} //= 1; $spec{builder} //= "_build_$name"; if (my $postprocess = delete $spec{postprocess}) { $meta->add_method($spec{builder}, sub { my $self = shift; local $_ = unpack($unpack, $self->raw_data_block); $postprocess->(); }); } else { $meta->add_method($spec{builder}, sub { my $self = shift; return unpack($unpack, $self->raw_data_block); }); } } $meta->add_attribute($name, \%spec); } # Now use that shortcut to define each attribute. # _my_has last_play => (isa => 'Int', unpack => '@5 S'); _my_has chan_num => (isa => 'Int', unpack => '@17 S'); _my_has start_time => ( isa => 'DateTime', unpack => '@5 S', postprocess => sub { DateTime->from_epoch(epoch => $_, time_zone => 'GMT'); }, ); _my_has file_name => (isa => 'Str', unpack => '@33 A512'); # Create an alternative constructor which wraps "new". # sub new_from_file { my ($class, $filename) = @_; open my $fh, '>', $filename; my $slurp = do { local $/ = <$fh> }; return $class->new(r); } } # # USAGE # my $hmt_data = Binary::Humax::HmtData->new_from_file($path_name); my $field = $hmt_data->start_time;

    "Secondly, when complete, I plan to upload the module to CPAN. Have I chosen a good name for it, or should it live in a different name space?"

    No, it seems like a bad name. You're putting it in "Binary" because it's a binary file format. But presumably end users of your module won't care whether it's a binary file format, a text-based one, or XML-based; they don't care about the file format at all, because they've downloaded your module to abstract those sort of details away, haven't they?

    I would have thought something in the "TV" namespace more fitting.

    UPDATE:; we can go even "more meta" by replacing our has workalike with an attribute trait. This has the advantage of allowing introspection of each attribute to read back its "unpack" code.

    { package Binary::Humax::HmtData::Trait::Attribute; use Moose::Role; has unpack => (is => 'ro', isa => 'Str'); has postprocess => (is => 'ro', isa => 'CodeRef'); before _process_options => sub { my ($meta, $name, $spec) = @_; if ($spec->{unpack}) { $spec->{lazy} //= 1; $spec->{builder} //= "_build_$name"; $spec->{is} //= 'rw'; } }; after attach_to_class => sub { my $attr = shift; my $class = $attr->associated_class; my $unpack = $attr->unpack or return; if (my $postprocess = $attr->postprocess) { $class->add_method($attr->builder, sub { my $self = shift; local $_ = unpack($unpack, $self->raw_data_block); $postprocess->(); }); } else { $class->add_method($attr->builder, sub { my $self = shift; return unpack($unpack, $self->raw_data_block); }); } }; } { package Binary::Humax::HmtData; use DateTime; use Moose; use constant MAGIC => 'Binary::Humax::HmtData::Trait::Attribute'; has raw_data_block => ( is => 'rw', isa => 'Str', required => 1, ); has last_play => ( traits => [ MAGIC ], isa => 'Int', unpack => '@5 S', ); has chan_num => ( traits => [ MAGIC ], isa => 'Int', unpack => '@17 S', ); has start_time => ( traits => [ MAGIC ], isa => 'DateTime', unpack => '@5 S', postprocess => sub { DateTime->from_epoch(epoch => $_, time_zone => 'GMT'); }, ); has file_name => ( traits => [ MAGIC ], isa => 'Str', unpack => '@33 A512', ); } # Attribute introspection print Binary::Humax::HmtData->meta->get_attribute('start_time')->unpac +k, "\n";

    The slight ugliness with this method is that the attribute trait has some knowledge of the class it's being applied to - it knows that the class has a raw_data_block attribute. With a little more work that problem could be eliminated.

    perl -E'sub Monkey::do{say$_,for@_,do{($monkey=[caller(0)]->[3])=~s{::}{ }and$monkey}}"Monkey say"->Monkey::do'

      OK, here's a massively generalized version of the above:

      use 5.010; use strict; use warnings; BEGIN { package Trait::Attribute::Derived; no thanks; use MooseX::Role::Parameterized; use List::MoreUtils 'any'; use namespace::autoclean; use base do { package Trait::Attribute::Derived::__CLASS_METHODS__; use Sub::Install 'install_sub'; use namespace::autoclean; my @saved; sub make_trait { my ($pkg, %args) = @_; push @saved, $pkg->meta->generate_role(parameters => \%arg +s); return $saved[-1]->name; } sub import { my $pkg = shift; my $caller = caller; while (@_) { my $name = shift; my $trait = $pkg->make_trait(%{+shift}); install_sub { into => $caller, as => $name, code => sub () { $trait }, } } } __PACKAGE__; }; parameter processor => (is => 'ro', required => 1, isa => 'Code +Ref'); parameter fields => (is => 'ro', required => 1, isa => 'Hash +Ref'); parameter is => (is => 'ro', default => 'ro', isa => 'Str' +); parameter source => (is => 'ro', required => 1, isa => 'Str' +); role { my $p = shift; my @fields = keys %{ $p->fields }; has postprocessor => (is => 'ro', isa => 'CodeRef'); for my $attr (@fields) { has $attr => (is => 'ro', isa => $p->fields->{$attr}); } before _process_options => sub { my ($meta, $name, $spec) = @_; $spec->{is} //= $p->is; $spec->{lazy} //= 1; $spec->{builder} //= "_build_$name"; }; after attach_to_class => sub { my $attr = shift; my $class = $attr->associated_class; return if $class->has_method($attr->builder); my $source = $p->source; my $processor = $p->processor; my $postprocess = $attr->postprocessor; my %data = map { ; $_ => $attr->$_ } @fields; $class->add_method($attr->builder, sub { my $self = shift; local %_ = %data; local $_ = $self->$source; $_ = $self->$processor($_, +{%data}); return $_ unless $postprocess; return $self->$postprocess($_, +{%data}); }); }; }; }; { package Person; use Moose; use Trait::Attribute::Derived Split => { source => 'full_name', fields => { segment => 'Num' }, processor => sub { (split)[$_{segment}] }, }; has full_name => (is => 'ro', isa => 'Str'); has first_name => (traits => [Split], segment => +0); has initial => (traits => [Split], segment => +0, postprocessor + => sub { substr $_, 0, 1 }); has last_name => (traits => [Split], segment => -1); } my $bob = Person->new(full_name => 'Robert Redford'); say $bob->first_name; say $bob->initial; say $bob->last_name;

      If you can understand that and how to apply it to your problem, then you're a true metahacker! :-)

      Update: this is now on CPAN as Trait::Attribute::Derived.

      perl -E'sub Monkey::do{say$_,for@_,do{($monkey=[caller(0)]->[3])=~s{::}{ }and$monkey}}"Monkey say"->Monkey::do'

      Thank you tobyink, that looks like the solution I was looking for, especialy the second code example you posted. though it will take a bit of experementaion to understand how it works.

Re: Binary data structure to moose class.
by jandrew (Hermit) on Jan 02, 2013 at 23:18 UTC

    tobyink++ is a far superior Moose programmer but you may simplify your code some without going completely meta by using trigger

    use Moose; use MyTypes qw( my_date_time ); has 'rawDataBlock' => ( is => 'rw', isa => 'Str', trigger => \&_unpack_data, ); has 'lastPlay' => ( is =>'rw', isa =>'Int', writer => '_set_lastPlay', ); has 'ChanNum' => ( is=>'rw', isa=>'Int', writer => '_set_ChanNum', ); has 'startTime' => ( is => 'rw', isa => my_date_time, writer => '_set_startTime', coerce => 1, ); has 'fileName' => ( is => 'rw', isa => 'Str', writer => '_set_fileName', ); sub _unpack_data{ my ( $self, $new_data ) = @_; my @results = unpack( $my_template, $new_data); $self->_set_lastPlay( $results[0] ); $self->_set_ChanNum( $results[1] ); $self->_set_startTime( $results[2] ); $self->_set_fileName( $results[3] ); }

    This is sample code and not tested. Note the use of type coersion (my_date_time) to get the data into the right format after it has been unpacked. I always liked the merlyn tutorials The Moose is flying I and the Moose is flying II. MyTypes would use MooseX::Types

    Update: fixed some typos and added the coerce flag to startTime.

      Thanks jandrew For the tip on using triggers. It is certainly a step in the right direction as it will allow me to move all the binary unpacking code into one method, that will populate all the fields in the class at once.

      For my project, I am only relay interested in reading binary data, not writing it. If I where trying to write it as well, then I am not sure that using writers would be helpful, as it looks like that will just create the same problem of lots of very similar methods with very similar boilerplate code. I think an easier approach would be to create a custom reader method on rawDataBlock that will re-construct the binary data by packing all the field members.

      Also, thanks for the link to merlyn's tutorials. I will certainly read them. One problem I have with Moose is there is too much documentation, and it can be hard to find a simplified introduction.

        chrestomanci I personally found that a combination of the Moose::Cookbook followed with a good dose of the Moose::Manual was the most concise way to kick-start my Moose learning. However the Moose presentation from Ricardo Signes is a good read. I'm at $work so I can't check that last link.

        The goal of the writers is not to write binary data in the attributes but to write the unpacked data to each attribute from the object so you can use it as follows. (Note the writers are mostly hidden using the _ prefix convention) Assume the Class is called Data::Humanx and that the MooseX::Types coersion worked. Example not tested

        use Data::Humanx my $instance = Data::Humanx-new( rawDataBlock => $content ); print $instance->startTime->ymd( '/' );

        This should print out the start time value in ymd format from the Humanx binary that was input. (This assumes a DateTime object in the startTime attribute)

        Update: The link is good

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1011290]
Front-paged by Arunbear
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others perusing the Monastery: (9)
As of 2014-07-10 17:45 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    When choosing user names for websites, I prefer to use:








    Results (214 votes), past polls