Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

while behaviour on binary files

by Anonymous Monk
on Nov 14, 2013 at 14:42 UTC ( [id://1062599]=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I am trying to read data from a binary file. It consists of a sequence identifier followed by two sequences of unsigned integers. The format of this file is like this:
First 4 bytes is general information then the data comes in blocks like this:
next 4 bytes sequence identifier
next 2 bytes is the length y of the sequence
next y * 2 bytes is the first sequence
next 2 bytes is a separator with the second sequence
next y * 2 bytes is the second sequence
next 2 bytes is a separator with the following block
The number of these blocks varies and is not known when opening the file. I wrote some code to get the sequences out of the binary file

open(DATFILE, "<$datfilename") or die $!; binmode(DATFILE); read(DATFILE, $_, 4, 0); # Read 4 bytes of the general information foreach (0..110){ read(DATFILE, $_, 4, 0); # Read 4 bytes of the profile ID read (DATFILE, $_, 2, 0); # Read 2 bytes of the sequencelength &ReadData ($profilelength); # read the first sequence read (DATFILE, $_, 2, 0); # Read 2 bytes of the trailing zero &ReadData ($profilelength); # read the second sequence read (DATFILE, $_, 2, 0); # Read 2 bytes of the trailing zero }

This particular file has 111 data blocks and this codes works. The subroutine ReadData puts the data in an array. No problems here.
For the real thing I want to replace the foreach (0..110) by while <DATFILE> to keep reading until the eof since I do not know the number of blocks. When I do this the read behaviour changes. Instead of reading the expected byte number 5 when using foreach it starts reading at byte 15 when using while. This is within the sequence and that means the length of the sequence is wrong and the data that comes out is corrupt. Could any of the wise monks here kindly explain this while behaviour to me and perhaps a way to do it the proper way? Kind regards, Hans

Replies are listed 'Best First'.
Re: while behaviour on binary files
by ikegami (Patriarch) on Nov 14, 2013 at 14:45 UTC
    while (!eof(DATFILE))

    sub _read { my ($fh, $bytes_to_read) = @_; my $buf = ''; while ($bytes_to_read) { my $bytes_read = read($fh, $buf, $bytes_to_read, length($buf)); if (!$bytes_read) { die "Error reading: $!\n" if !defined($bytes_read); die "Unexpected EOF\n"; } $size -= $bytes_read; } return $buf; } sub read_uint32 { my ($fh) = @_; unpack('N', _read($fh, 4)) } #Or V? sub read_uint16 { my ($fh) = @_; unpack('n', _read($fh, 2)) } #Or v? sub read_pstring { my ($fh) = @_; _read($fh, read_uint16($fh)) } { open(my $fh, '<:raw', $datfilename) or die $!; read_uint32($fh); while (!eof($fh)) { my $profile_id = read_uint32($fh); my $seq1 = read_pstring($fh); my $seq2 = read_pstring($fh); read_uint16($fh); ... } }
    Or if the "trailing zero" is meant to indicate the end of a list of sequences,
    { open(my $fh, '<:raw', $datfilename) or die $!; read_uint32($fh); while (!eof($fh)) { my $profile_id = read_uint32($fh); my @seqs; while (length( my $seq = read_pstring($fh) )) { push @seqs, $seq; } ... } }

      Thank you very much for your suggestions ikegami. The first one (!eof(DATFILE) solved my problem. I'll read up on while to understand why !eof changes the behaviour of while. After all I came here to seek wisdom and the journey to the solution often teaches you more than the solution itself. Thanks for pointing the direction I in which I have to travel. I'll also look into your other solutions and use that to make my code better. Sincerely, Hans

        Your misunderstanding is that <DATFILE> actually _reads_ from the file until the next end-of-line character. (You then throw away the data.) It's not the test for end-of-file. eof() is.

      Nicely done.   ++ ...

Re: while behaviour on binary files
by oiskuu (Hermit) on Nov 14, 2013 at 17:33 UTC
    Just for fun, I considered what the unpack template for the record stream might be.
    my ($inf, @dat) = unpack 'a4 (L (S/(xx) XX.@2/a XX.x4/a xx))<*', do +{ local $/; <DATFILE> }; while (@dat) { my ($ID, $s1, $s2) = splice(@dat, 0, 3); }
    Not that it's sensible, practical or robust...

      Definite upvote on the preceding comment! unpack is your best friend when dealing with (presumed) fixed-width data records.

      It helps to remember that the primary goal is to drain the swamp even when you are hip-deep in alligators.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1062599]
Approved by marto
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having a coffee break in the Monastery: (7)
As of 2024-03-28 19:44 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found