Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

Re: Reading binary file in perl having records of different length

by Anonymous Monk
on Jun 17, 2014 at 00:58 UTC ( #1090088=note: print w/ replies, xml ) Need Help??


in reply to Reading binary file in perl having records of different length

1. Yes. Based on your description, your approach looks good, and it's the approach I would have chosen (of a few possible ones). A few minor improvement suggestions below.

2. Reading the entire file and spliting it might make the code a little "easier", if you can safely say that the "==" sequence never appears anywhere else in the file - otherwise it'll make things more complicated! Also it can be expected that this method would take more memory and probably be slower. I'd stick with your current approach.

3. You could achieve this by adding some state to your parsing. To do it "right" would require some rewriting of the code. TIMTOWTDI, I'll suggest one possible approach in pseudocode:

my $expect = 'eyecatcher'; my $record; while (1) { if ($expect eq 'eyecatcher' || $expect eq 'eyecatcher_after_record') + { if (read_two_bytes() eq '==') { process_record($record) if $expect eq 'eyecatcher_after_record'; $record = undef; $expect = 'length'; } else { die "expected eyecatcher" } } elsif ($expect eq 'length') { my $length = read_two_bytes(); $record = read_bytes($length); $expect = 'eyecater_after_record'; } }

I hope this makes sense. You can break out of the while based on when you hit the end of the file, and you may need to then process any unprocessed final $record.

A few improvement suggestions to your current code: The most major one is that you don't check the return value of read to make sure that you actually got back the number of bytes you requested, you should probably do that to handle any errors in reading the file (such as premature EOF). A small one: You currently declare $length twice, you can remove the declaration before the loop. Although it doesn't really hurt, I don't think you need the initial unpack, a simple $buffer eq '==' should be enough. Same thing on the second read, a simple unpack('s',$buffer) should be enough. And another minor nit might be that you could declare my $xdr inside the loop, so you don't need to treat it like a global and clear it at the end of every loop.

Otherwise, good!


Comment on Re: Reading binary file in perl having records of different length
Select or Download Code
Re^2: Reading binary file in perl having records of different length
by Anonymous Monk on Jun 17, 2014 at 01:18 UTC

    Ack! Learn a lesson from my own mistake and use constants instead of strings for $expect (note the typo "eyecater"). I was being lazy :-(

    Also, the pseudocode doesn't handle the case of the file not beginning with "==", which you could handle in the first else like so: else { die "expected eyecatcher" unless $expect eq 'eyecatcher' }.

    If the logic starts getting too complex, get a little more verbose and break the first if up: if ($expect eq 'eyecatcher') {} elsif ($expect eq 'eyecatcher_after_record') {} and so on. Always cover all branches; at the very least throw a else { die "unexpected" } on there during development.

    And choosing the right names for your states helps a lot. For example, "eyecatcher" might be better named "first_eyecatcher".

      Thank you so much, your improvements to my existing code were great and I am currently trying to modify the code as per your suggestions.

      One thing I have to ensure is while I am reading the record and if it has a bad length, it will read in to the next possibly good record and send that to parsing subroutine and will also prevent me from processing the good record as the eye catcher might have probably been read by previous read command.

      Will explore more ways and get back to you. Thanks again for great comments.

        The algorithm can be modified to handle bad "length" values by adding some logic in the place where it currently dies. If you find a bad record, you could rewind to the last known "good" position via tell and seek and look for the next "==" (an alternative might be to implement your own buffer to look around in, but read should already be buffered). Similar logic would allow you to find "==" that are not aligned properly - the example above doesn't handle the case of the file starting with "x==". (Yet another approach is to read the file byte-by-byte instead of every two bytes - then you would extend your states and have "expect_first_equals", "expect_second_equals", "expect_first_length_byte", and "expect_second_length_byte".)

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1090088]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others meditating upon the Monastery: (18)
As of 2014-08-01 15:24 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Who would be the most fun to work for?















    Results (27 votes), past polls