One of the keys in processing files like this is knowing how confident you can be in the location of any piece of data. The most likely thing to mess this up would be data that exceeds its usual field size and causes an extra line in the output. Or data missing that results in a shorter document than you expect. You have very few fields with tags to help you identify them, so position is going to be how you identify what you're seeing at any given location in the document. I would use regexes and code very defensively, making sure dates look like dates, phone numbers like phone numbers, prices like prices.