Re: how to write multi-line regex

Even if you could and even if you do “do it this way,” I frankly wouldn’t. Instead, I would process this file a line at a time, “awk-style,” using logic that gathers information from each line as-presented (or ignores the line, as the case may be), then does something with the accumulated information when an appropriate sentinel line – e.g. ENDDEL; or an empty-line or end-of-file – is encountered.

The difficulty of a “clever multi-line regex” approach is not so much that you can manage to get such a thing to at-least appear to work in a handful of test cases, but rather that it is likely to be fairly well-nigh impossible to prove that the algorithm actually works for every well-formed file that is presented to it. Let alone that it will correctly reject any file that is not well-formed. Then, the next near-impossibility will be to maintain the thing over time, continually adapting it to meet evolving conditions and/or to deal with bugs in the (third-party supplied) data feed that the aforesaid third-party just won’t ever get around to fixing. It happens. A lot.

The line-by-line approach, on the other hand, works well. Some line will mark the beginning of a potentially-useful set of information, while another line (and/or end-of-file) will mark the end. In-between these two lines are: (a) lines that contain more useful things; and (b) lines that you recognize but choose to ignore; and (c) lines that you do not recognize, meaning either that your program is now insufficient or that the data-vendor has once again screwed-up. Robust, awk-style logic can be built in this way, and, if built well, it will last for years. Therefore, it’s my strong opinion that this is the result that you ought to take here.


Perl Monk, Perl Meditation
	PerlMonks