Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer

Iteratively unpack structure from binary file

by cMonkE (Initiate)
on Oct 20, 2014 at 16:58 UTC ( [id://1104462]=perlquestion: print w/replies, xml ) Need Help??

cMonkE has asked for the wisdom of the Perl Monks concerning the following question:

Perl Monks, I humbly seek your wisdom:

I wish to unpack a series of strustures from a binary file. I have used:

@array = unpack("f*", join('',<$filehandle>));
to load an array of floats into memory from a smaller binary file, but
a) This file is too big to fit in memory
b) It has a complex structure, like "int, int, float, float, float" repeating.

How shall I iteratively unpack the next structure from the file into a list of scalars, without loading the whole file into memory?

Replies are listed 'Best First'.
Re: Iteratively unpack structure from binary file
by AnomalousMonk (Archbishop) on Oct 20, 2014 at 17:37 UTC

    I would say that the first step is to find the length in bytes of the raw data block represented by  'i i f f f' (remembering that the data sizes of these template specifiers are implementation dependent).

    c:\@Work\Perl>perl -wMstrict -le "my $template = 'i2 f3'; ;; sub template_len { my ($template) = @_; ;; my $s = pack $template; return length $s; } ;; my $buf_size = template_len($template); print $buf_size; ;; print template_len('i i f f f'); print template_len('i f3 i'); " 20 20 20
    Knowing the size of a single raw block allows reading n raw blocks of data at a time with, IIRC, read or sysread (or the appropriate buffered-read built-in) and then unpack-ing with a template like "($template)n" or just "($template)*".

      Well, read buffers, so,
      my $template = 'iifff'; my $rec_size = template_len($template); while (1) { my $rv = read($fh, my $rec, $rec_size); die "$!\n" if !defined($rv); last if !$rv; die "Premature EOF\n" if $rv < $rec_size; my @fields = unpack($template, $rec); ... }
Re: Iteratively unpack structure from binary file
by mrdvt92 (Acolyte) on May 18, 2021 at 14:25 UTC

    I just saw this on

    I wrote something like this where I had a template which described the format for the data stream.

    My data stream was always like this so it was fairly easy to parse out

    [namespace][object_key][payload length][payload]
      namespace - a single character that allowed expansion in the future we only used one namespace.
      object_key - a single character that defined the object, payload length, and payload format
      payload length - was a function of object key but defaulted to unsigned 8 bit int (could be 16 bit or 32 bit int)
      payload - was a binary blob kind of like protocol buffers binary stream

    So, based on which namespace and object key the payload was read (to length) and then passed to the correct payload formatter like this.

    sub format { return [ [Version1 => [1, 'C', '%u', undef, 'chec +kString']], [Version1String => [0, 'a*', '%s', undef, 'chom +pString']], [Version2 => [1, 'C', '%u', undef, 'chec +kString']], [Version2String => [0, 'a*', '%s', undef, 'chom +pString']], ]; }
      format is list "key" => "plan"
      plan is "length", "unpack", "sprintf display", "scale formula", "caller for extra work" (The plan should have been a hash not an array but I never got there.)

    While reading, the format is actually shifted for each data element read and then passed by reference into the reader so that the reader can modify subsequent formats. In the example above the checkString function searches the read ahead buffer until it see a "\000" and then set the length on the next picture. (For new development recommend "string length" + "string")

    The scale is this data [$a, $b, $c, $d] and the following formula $val=($a + $b * $val)/($c + $d * $val) which allows anything to be scaled quite easily. (not my formula but I cannot find source anymore)

    Other examples

    ['Speed' => [2, 'v', '%u', [ 0,1,100,0]]],# +speed x 100 , 16 bit vax unsigned, 0 to 65535 (655.35) m/s ['Voltage' => [1, 'C', '%u', [200,1,100,0]]],# +Voltage x100, + 2V, 8 bit char unsigned, 0 (2V) to 255 (4.55V) ['Method' => undef ],# +0 length item for using the format for other things like building hum +an displays
    sub format { return [ [ID => [10, 'C[10]', '%02X%02X%02X%02X%02X%02X%02 +X%02X%02X%02X']], [MajorVersion => [1, 'C', '%u']], [MinorVersion => [1, 'C', '%u']], [SubVersion => [1, 'C', '%u']], ]; }

    This was a great start and works quite well but, it only allows reading. I also needed to write the data back and that was not possible with this format structure.

Re: Iteratively unpack structure from binary file ( ReadBytes, ReadFloat, ReadInt32 )
by Anonymous Monk on Oct 21, 2014 at 06:56 UTC
    #!/usr/bin/perl -- use strict; use warnings; use Carp qw/ /; use Data::Dump qw/ dd /; Main( @ARGV ); exit( 0 ); sub Main { ... while(not eof $filehandle){ my $record = ReadJiggy( $filehandle ); dd( $record ); } } sub ReadJiggy { my( $fh ) = @_; return [ ReadInt32( $fh ), ReadInt32( $fh ), ReadFloat( $fh ), ReadFloat( $fh ), ReadFloat( $fh ), ]; } sub ReadBytes { my( $fh, $bytes ) = @_; $bytes or Carp::croak 'Usage: ReadBytes( $filehandle, $bytes ) '; my $readed = read $fh, my($data) , $bytes; $readed == $bytes or Carp::carp "Only read($readed) but wanted($by +tes): $! ## $^E "; $data; } use constant CAN_PACK_QUADS => !! eval { my $f = pack 'q'; 1 }; sub Int8 { unpack 'c', $_[-1] } sub UInt8 { unpack 'C', $_[-1] } sub Int16 { unpack 's<', $_[-1] } sub UInt16 { unpack 'S<', $_[-1] } sub Int32 { unpack 'l<', $_[-1] } sub UInt32 { unpack 'L<', $_[-1] } sub Int64 { unpack( ( CAN_PACK_QUADS ? 'q<' : 'a8' ), $_[-1] ) } sub UInt64 { unpack( ( CAN_PACK_QUADS ? 'Q<' : 'a8' ), $_[-1] ) } sub ReadInt8 { Int8( ReadBytes( $_[-1], 8 /8 ) ); } sub ReadUInt8 { UInt8( ReadBytes( $_[-1], 8 /8 ) ); } sub ReadInt16 { Int16( ReadBytes( $_[-1], 16/8 ) ); } sub ReadUInt16 { UInt16( ReadBytes( $_[-1], 16/8 ) ); } sub ReadInt32 { Int32( ReadBytes( $_[-1], 32/8 ) ); } sub ReadUInt32 { UInt32( ReadBytes( $_[-1], 32/8 ) ); } sub ReadInt64 { Int64( ReadBytes( $_[-1], 64/8 ) ); } sub ReadUInt64 { UInt64( ReadBytes( $_[-1], 64/8 ) ); } sub Float { unpack 'f', $_[-1] } sub ReadFloat { Float( ReadBytes( $_[-1], 32/8 ) ); } #~ perlpacktut says #~ f A single-precision float in native format. #~ d A double-precision float in native format. #~ see perlport sub Double{ unpack 'd', $_[-1] } sub ReadDouble{ Float( ReadBytes( $_[-1], 32/8 ) ); }

      That's a lot of calls to read and unpack. It would be far faster to process at least record at a time, if possible.

      By the way, you forgot to specify endianness for the floating-point types.

        That's a lot of calls to read and unpack. It would be far faster to process at least record at a time, if possible.

        Yup, but I like the memorable-self-documenting-english-worded-ness of the api ...

        The OP can always streamline his API once he gets things working the way he wants

        By the way, you forgot to specify endianness for the floating-point types.

        Its deliberate as per the comments copied from pack docs , pack has more about it ... float/double are very very platform specific even if you specify endianess

        I can't guess how it gets twisted across platforms so I leave it as is

        perlpacktut recommends Convert::Binary::C :) I find "my api" (similar to what I saw in javascript/java/c#sharp ...) easier

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1104462]
Approved by Old_Gray_Bear
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (4)
As of 2024-07-18 20:06 GMT
Find Nodes?
    Voting Booth?

    No recent polls found

    erzuuli‥ 🛈The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.