Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

Re^2: Reading binary file in perl having records of different length

by Anonymous Monk
on Jun 17, 2014 at 01:18 UTC ( #1090089=note: print w/ replies, xml ) Need Help??


in reply to Re: Reading binary file in perl having records of different length
in thread Reading binary file in perl having records of different length

Ack! Learn a lesson from my own mistake and use constants instead of strings for $expect (note the typo "eyecater"). I was being lazy :-(

Also, the pseudocode doesn't handle the case of the file not beginning with "==", which you could handle in the first else like so: else { die "expected eyecatcher" unless $expect eq 'eyecatcher' }.

If the logic starts getting too complex, get a little more verbose and break the first if up: if ($expect eq 'eyecatcher') {} elsif ($expect eq 'eyecatcher_after_record') {} and so on. Always cover all branches; at the very least throw a else { die "unexpected" } on there during development.

And choosing the right names for your states helps a lot. For example, "eyecatcher" might be better named "first_eyecatcher".


Comment on Re^2: Reading binary file in perl having records of different length
Select or Download Code
Re^3: Reading binary file in perl having records of different length
by jaypal (Beadle) on Jun 17, 2014 at 02:06 UTC

    Thank you so much, your improvements to my existing code were great and I am currently trying to modify the code as per your suggestions.

    One thing I have to ensure is while I am reading the record and if it has a bad length, it will read in to the next possibly good record and send that to parsing subroutine and will also prevent me from processing the good record as the eye catcher might have probably been read by previous read command.

    Will explore more ways and get back to you. Thanks again for great comments.

      The algorithm can be modified to handle bad "length" values by adding some logic in the place where it currently dies. If you find a bad record, you could rewind to the last known "good" position via tell and seek and look for the next "==" (an alternative might be to implement your own buffer to look around in, but read should already be buffered). Similar logic would allow you to find "==" that are not aligned properly - the example above doesn't handle the case of the file starting with "x==". (Yet another approach is to read the file byte-by-byte instead of every two bytes - then you would extend your states and have "expect_first_equals", "expect_second_equals", "expect_first_length_byte", and "expect_second_length_byte".)

        Great points!! I think reading one byte at a time is probably the safest. I am a quality assurance engineer and this code won't be ran on production (so I am not reading through a socket where high performance is needed). I will be testing binary files (one at a time) created by my application and have around 60-70 test scenarios that can be automated with this parser that ensures the binary file is constructed correctly.

        tell and seek looks interesting. I will go through the docs and some examples. Thanks again for all your help. You've been great!!

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1090089]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (5)
As of 2014-12-21 21:56 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (108 votes), past polls