in reply to reading binary files with Perl

Something like this should do it. See the docs and/or ask for anything you do not understand.

#! perl -slw use strict; my @grid; open my $fh, '<:raw', 'the file' or die $!; while( 1 ) { my( $recSize, $dummy, $record ); sysread( $fh, $recSize, 4 ) or last; $recSize = unpack 'N', $recSize; ##(*) sysread( $fh, $record, $recSize ) == $recSize or die "truncated record"; sysread( $fh, $dummy, 4 ) == 4 and unpack( 'N', $dummy ) == $recSize ##(*) or die "missing or invalid trailer"; ## (*) You may need V depending upon which platform your file was +created on push @grid, [ unpack 'N*', $record ]; } close $fh; ## @grid should now contain your data ## Addressable in the usual $grid[ X ][ Y ] manner. ## Though it might be $array[ Y ][ X ] ## I forget which order FORTRAN writes arrays in?

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

Replies are listed 'Best First'.
Re^2: reading binary files with Perl
by ikegami (Pope) on Nov 16, 2006 at 16:29 UTC
    • Why sysread over read? The only difference is that read is buffered, which is a good thing. I'd replace sysread with read.

    • N* for floats?

    • I don't think a smaller than expected return value is an error. It simply means you need to call the read function again.

      Why sysread over read? The only difference is that read is buffered, which is a good thing. I'd replace sysread with read.

      Partially habit. On my system, at least at some point in the past, the interaction between Perl buffering and the OS caching was less productive that using the systems caching alone.

      Partially because in perlfunc sysread it says:

      It bypasses buffered IO, so mixing this with other kinds of reads, print, write, seek, tell, or eof can cause confusion because the perlio or stdio layers usually buffers data.

      And since I used '<:raw', which (as I understand it, bypasses PerlIO layers), it seems prudent to avoid buffered IO calls.

      N* for floats?

      Mea culpa. The code is untested as I don't have a relevant data file, and could not mock one up because I do not know what system it was written on.

      Basically, the code I posted was intended as an example of how to proceed, not production ready copy&paste.

      I don't think a smaller than expected return value is an error. It simply means you need to call the read function again.

      I think that's true when reading from a stream device--terminal, socket or pipe--but for a disk file, if you do not get the requested number of bytes, (I believe) it means end of file.

      I'm open to correction on that, but I do not see the circumstances in which a disk read would fail to return the requested number of bytes if they are available?


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

        And since I used '<:raw', which (as I understand it, bypasses PerlIO layers),

        "The stream will still be buffered." That's a direct quote from PerlIO's :raw documentation.