jpk1292000 has asked for the wisdom of the Perl Monks concerning the following question:

Hi monks, I'm new to the board and I've been struggling with this problem for some time now. Hope someone can give me some suggestions... I am trying to read a binary file with the following format: The 4-byte integer and (4 byte float) are in the native format of the machine.
*** First record (4 byte integer) - byte size of record (4*N) (f77 header) (4 byte float) .. value 1 (4 byte float) .. value 2 ... (4 byte float) .. value N N = number of grid points in the field (4 byte integer) .. byte size of record (4*N) (f77 trailer) **** Second record (4 byte integer) - byte size of record (4*N) (f77 header) (4 byte float) .. value 1 (4 byte float) .. value 2 ... (4 byte float) .. value N N = number of grid points in the field (4 byte integer) .. byte size of record (4*N) (f77 trailer)
The data is meteorological data (temperature in degrees K) on a 614 x 428 grid. I tried coding up a reader for this, but am getting nonsensical results. Here is the code:
my $out_file = "/dicast2-papp/DICAST/smg_data/" . $gfn . ".bin"; #path + to binary file my $template = "if262792i"; #binary layout (integer 262792 floats in +teger) as described in the format documentation above (not sure if th +is is correct) my $record_length = 4; #not sure what record_length is supposed to rep +resent (number of values in 1st record, or should it be length of var +iable [4 bytes]) my (@fields,$record); open (FH, $out_files ) || die "couldn't open $out_files\n"; until (eof(FH)) { my $val_of_read = read (FH, $record, $record_length) == $record_ +length or die "short read\n"; @fields = unpack ($template, $record); print "field = $fields[0]\n"; }
The results I get when I print out the first field are non-sensical (negative numbers, etc). I think the issue is that I'm not properly setting up my template and record length. Also, how do I find out what is "the native format of the machine"?

Replies are listed 'Best First'.
Re: reading binary files with Perl
by davorg (Chancellor) on Nov 16, 2006 at 15:53 UTC

    You can find out more about how "read" works by reading its documentation.

    From there, you'll find out that the third parameter (your $record_length) is the number of bytes to read from the filehandle[1]. As your template is set up to handle all of the data for one record in one go, you'll need to read one record's worth of data. That's 4 * (1 + 262792 + 1) bytes of data. Currently you're reading four bytes, and the template is looking for a lot more.

    The documention for unpack says this:

    If there are more pack codes or if the repeat count of a field or a group is larger than what the remainder of the input string allows, the result is not well defined: in some cases, the repeat count is decreased, or unpack() will produce null strings or zeroes, or terminate with an error. If the input string is longer than one described by the TEMPLATE, the rest is ignored.

    [1] Actually, the number of _characters_ but let's assume single byte characters for the time being.

    --
    <http://dave.org.uk>

    "The first rule of Perl club is you do not talk about Perl club."
    -- Chip Salzenberg

Re: reading binary files with Perl
by ikegami (Pope) on Nov 16, 2006 at 16:04 UTC

    Depending on your OS, another problem is the lack of binmode. Add binmode(FH) after the open so that Perl doesn't mess with the data. Not all OSes require binmode, but it's safe to use binmode on all OSes.

    Oh and I'd use l instead of i. i is not guaranteed to be 4 bytes.

      Got it working. Thanks for help. My problem was two-fold. I wasn't using the correct record length, and I wasn't using bin mode. Once I fixed these two issues, it worked.
Re: reading binary files with Perl
by BrowserUk (Pope) on Nov 16, 2006 at 16:13 UTC

    Something like this should do it. See the docs and/or ask for anything you do not understand.

    #! perl -slw use strict; my @grid; open my $fh, '<:raw', 'the file' or die $!; while( 1 ) { my( $recSize, $dummy, $record ); sysread( $fh, $recSize, 4 ) or last; $recSize = unpack 'N', $recSize; ##(*) sysread( $fh, $record, $recSize ) == $recSize or die "truncated record"; sysread( $fh, $dummy, 4 ) == 4 and unpack( 'N', $dummy ) == $recSize ##(*) or die "missing or invalid trailer"; ## (*) You may need V depending upon which platform your file was +created on push @grid, [ unpack 'N*', $record ]; } close $fh; ## @grid should now contain your data ## Addressable in the usual $grid[ X ][ Y ] manner. ## Though it might be $array[ Y ][ X ] ## I forget which order FORTRAN writes arrays in?

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      • Why sysread over read? The only difference is that read is buffered, which is a good thing. I'd replace sysread with read.

      • N* for floats?

      • I don't think a smaller than expected return value is an error. It simply means you need to call the read function again.

        Why sysread over read? The only difference is that read is buffered, which is a good thing. I'd replace sysread with read.

        Partially habit. On my system, at least at some point in the past, the interaction between Perl buffering and the OS caching was less productive that using the systems caching alone.

        Partially because in perlfunc sysread it says:

        It bypasses buffered IO, so mixing this with other kinds of reads, print, write, seek, tell, or eof can cause confusion because the perlio or stdio layers usually buffers data.

        And since I used '<:raw', which (as I understand it, bypasses PerlIO layers), it seems prudent to avoid buffered IO calls.

        N* for floats?

        Mea culpa. The code is untested as I don't have a relevant data file, and could not mock one up because I do not know what system it was written on.

        Basically, the code I posted was intended as an example of how to proceed, not production ready copy&paste.

        I don't think a smaller than expected return value is an error. It simply means you need to call the read function again.

        I think that's true when reading from a stream device--terminal, socket or pipe--but for a disk file, if you do not get the requested number of bytes, (I believe) it means end of file.

        I'm open to correction on that, but I do not see the circumstances in which a disk read would fail to return the requested number of bytes if they are available?


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
Re: reading binary files with Perl
by jmcnamara (Monsignor) on Nov 16, 2006 at 16:33 UTC

    Try something like the following:
    #!/usr/bin/perl -w use strict; open FILE, 'file.bin' or die "Couldn't open file: $!\n"; binmode FILE; my $record = 1; my $buffer = ''; while ( read( FILE, $buffer, 4 ) ) { my $record_length = unpack 'N', $buffer; my $num_fields = $record_length / 4; printf "Record %d. Number of fields = %d\n", $record, $num_fie +lds; for (1 .. $num_fields ) { read( FILE, $buffer, 4 ); my $temperature = unpack 'f', $buffer; # Or if the above gives the wrong result try this: #my $temperature = unpack 'f', reverse $buffer; print "\t", $temperature, "\n"; } # Read but ignore record trailer. read( FILE, $buffer, 4 ); print "\n"; $record++; } __END__
    If the number of fields is wrong subtitute unpack 'V' for unpack 'N'. If the float is wrong try the reverseed value that is commented out.

    Update: Added read for trailer.

    --
    John.