comment on

1. Yes. Based on your description, your approach looks good, and it's the approach I would have chosen (of a few possible ones). A few minor improvement suggestions below.

2. Reading the entire file and spliting it might make the code a little "easier", if you can safely say that the "==" sequence never appears anywhere else in the file - otherwise it'll make things more complicated! Also it can be expected that this method would take more memory and probably be slower. I'd stick with your current approach.

3. You could achieve this by adding some state to your parsing. To do it "right" would require some rewriting of the code. TIMTOWTDI, I'll suggest one possible approach in pseudocode:

my $expect = 'eyecatcher';
my $record;
while (1) {
  if ($expect eq 'eyecatcher' || $expect eq 'eyecatcher_after_record')
+ {
    if (read_two_bytes() eq '==') {
      process_record($record) if $expect eq 'eyecatcher_after_record';
      $record = undef;
      $expect = 'length';
    }
    else { die "expected eyecatcher" }
  }
  elsif ($expect eq 'length') {
    my $length = read_two_bytes();
    $record = read_bytes($length);
    $expect = 'eyecater_after_record';
  }
}
[download]

I hope this makes sense. You can break out of the while based on when you hit the end of the file, and you may need to then process any unprocessed final $record.

A few improvement suggestions to your current code: The most major one is that you don't check the return value of read to make sure that you actually got back the number of bytes you requested, you should probably do that to handle any errors in reading the file (such as premature EOF). A small one: You currently declare $length twice, you can remove the declaration before the loop. Although it doesn't really hurt, I don't think you need the initial unpack, a simple $buffer eq '==' should be enough. Same thing on the second read, a simple unpack('s',$buffer) should be enough. And another minor nit might be that you could declare my $xdr inside the loop, so you don't need to treat it like a global and clear it at the end of every loop.

Otherwise, good!

In reply to Re: Reading binary file in perl having records of different length by Anonymous Monk
in thread Reading binary file in perl having records of different length by jaypal

Are you posting in the right place? Check out Where do I post X? to know for sure.
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
Want more info? How to link or How to display code and escape characters are good places to start.


Perl-Sensitive Sunglasses
	PerlMonks