Re: Reading binary file in perl having records of different length

in reply to Reading binary file in perl having records of different length

I see a brother has pointed out the issue with == alignment. Another point to make is that reading the file as a whole can indeed simplify parsing. Consider:

$data = do{ undef $/; <$fh> };

while ($data =~ m/==/) {
    ($rec, $data) = unpack("n/A A*", $');
    process($rec);
}
[download]

Comment on Re: Reading binary file in perl having records of different length Download Code

Replies are listed 'Best First'.
Re^2: Reading binary file in perl having records of different length by jaypal (Beadle) on Jun 17, 2014 at 22:35 UTC
Thanks for the snippet. Yes that is an option I am exploring as well. Since I don't have a production grade binary yet (application is still being developed) I only have a binary file with 5 records (so can't test the performance benefit). What I am doing is creating a separate parsing subroutine which would expect one record at a time for parsing, so how I read is independent of parsing. I am putting a user defined choice at run time if the user wants to read the binary in slurp mode or byte mode. This is what I have for slurp mode: #!/usr/bin/perl use strict; use warnings; use Fcntl qw(:seek); use Data::Dumper; open my $fh, '<', 'Test.NEW' or die "File not found: $!"; binmode($fh); my $data = do{ undef $/; <$fh> }; my @data = split /(?=3d3d)/, unpack ('H*',$data); for my $xdr (@data) { open (my $fh1, '<', \$xdr); # create a filehandle from sc +alar read ($fh1, my $buffer, 2); # read the eye catcher read ($fh1, $buffer, 2); # read the length my $length = unpack 'n', $buffer; # identify the length in decimal seek $fh1, 0, 0; # reset the offset read ($fh1, $buffer, $length); #read till the length to preve +nt garbage bytes process($buffer); } #print Dumper \@data; [download]	[reply] [d/l]
Re^3: Reading binary file in perl having records of different length by johngg (Canon) on Jun 18, 2014 at 10:18 UTC
I don't have a production grade binary yet Since you are dealing with binary data I don't think your "eyecatcher" is a good idea as `"\x3d\x3d"` ("==") could legitimately be part of your data. I think it better to rely on a record starting with a byte count immediately followed by a fixed length header string that can easily be identified and validated, perhapd by regular expression, e.g. `/^Record\s\d{5}$/` for "Record 00001", "Record 02784" etc. The chance of such a string appearing in the binary data is very much less likely and should make unravelling bad records far easier. I don't know if you have any control over the format of the binary files but I feel that the "==" between records is just storing up trouble and should be reconsidered. It is too short to be unlikely to appear in the data and, by preceding the record, adds complications to record alignment. Cheers, JohnGG	[reply] [d/l] [select]
Re^4: Reading binary file in perl having records of different length by sundialsvc4 (Abbot) on Jun 18, 2014 at 21:12 UTC
I agree. I think that you should code this routine to be suspicious of the data, but also to completely rely on it. For example, I presume the first two bytes of the file should be an eyecatcher: `die` if they’re not. The next two bytes should decode to a plausible length ... `die` if they don’t. Read the specified number of bytes ... `die` if you can’t. The next thing that you read should either be “nothing” (end of file), or it should be an eyecatcher, rinse-and-repeat. Notice that, in this way, “if the program runs successfully, then you can indeed assert that the file’s structure must be good. Since big files can and do become corrupt sometimes (and come from other people’s software systems), this amount of caution is not paranoia. Not at all. (In fact, in a production setting, I would have a series of `.t` test-files that prove, and re-prove, that all of these `die` calls actually work.) There will be no harm in simply reading two bytes, then two bytes, then n bytes, and so on, letting Perl and the filesystem handle all of the buffering for you. It really doesn’t matter how big the file is.
Re^5: Reading binary file in perl having records of different length by jaypal (Beadle) on Jun 19, 2014 at 01:32 UTC

In Section Seekers of Perl Wisdom