Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery

Re^2: Reading binary file in perl having records of different length

by Anonymous Monk
on Jun 17, 2014 at 01:18 UTC ( #1090089=note: print w/replies, xml ) Need Help??

in reply to Re: Reading binary file in perl having records of different length
in thread Reading binary file in perl having records of different length

Ack! Learn a lesson from my own mistake and use constants instead of strings for $expect (note the typo "eyecater"). I was being lazy :-(

Also, the pseudocode doesn't handle the case of the file not beginning with "==", which you could handle in the first else like so: else { die "expected eyecatcher" unless $expect eq 'eyecatcher' }.

If the logic starts getting too complex, get a little more verbose and break the first if up: if ($expect eq 'eyecatcher') {} elsif ($expect eq 'eyecatcher_after_record') {} and so on. Always cover all branches; at the very least throw a else { die "unexpected" } on there during development.

And choosing the right names for your states helps a lot. For example, "eyecatcher" might be better named "first_eyecatcher".

Replies are listed 'Best First'.
Re^3: Reading binary file in perl having records of different length
by jaypal (Beadle) on Jun 17, 2014 at 02:06 UTC

    Thank you so much, your improvements to my existing code were great and I am currently trying to modify the code as per your suggestions.

    One thing I have to ensure is while I am reading the record and if it has a bad length, it will read in to the next possibly good record and send that to parsing subroutine and will also prevent me from processing the good record as the eye catcher might have probably been read by previous read command.

    Will explore more ways and get back to you. Thanks again for great comments.

      The algorithm can be modified to handle bad "length" values by adding some logic in the place where it currently dies. If you find a bad record, you could rewind to the last known "good" position via tell and seek and look for the next "==" (an alternative might be to implement your own buffer to look around in, but read should already be buffered). Similar logic would allow you to find "==" that are not aligned properly - the example above doesn't handle the case of the file starting with "x==". (Yet another approach is to read the file byte-by-byte instead of every two bytes - then you would extend your states and have "expect_first_equals", "expect_second_equals", "expect_first_length_byte", and "expect_second_length_byte".)

        Great points!! I think reading one byte at a time is probably the safest. I am a quality assurance engineer and this code won't be ran on production (so I am not reading through a socket where high performance is needed). I will be testing binary files (one at a time) created by my application and have around 60-70 test scenarios that can be automated with this parser that ensures the binary file is constructed correctly.

        tell and seek looks interesting. I will go through the docs and some examples. Thanks again for all your help. You've been great!!

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1090089]
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others chilling in the Monastery: (8)
As of 2018-02-21 12:03 GMT
Find Nodes?
    Voting Booth?
    When it is dark outside I am happiest to see ...

    Results (279 votes). Check out past polls.