http://www.perlmonks.org?node_id=1077002


in reply to Re: header footer
in thread header footer

record is arbitrary in length yes. But like i said the header and footer lenght is always 50 and 30. Individual record is only about 10kb. But there are way too many records. record size is not encoded in the header.

Replies are listed 'Best First'.
Re^3: header footer
by gupr1980 (Acolyte) on Mar 04, 2014 at 23:18 UTC
    so am i missing something in thinking that read line by line check if pattern HDR exists - cut from end of header to rest of line and > outfile rest of lines > outfile keep going till find pattern > FTR - cut from FTR to end of line > outfile would this not work?

      No, I don't think you're missing a thing, and it may look like:

      use strict; use warnings; while (<>) { s/^HDR.{47}|\KFTR.+//; print; }

      In fact, it may be faster than substr on the 3GB file, but am not sure. Pattern matching to remove the header seems just fine. However, if FTR exists anywhere else in the record, a substitution will mess up the record--which is something substr will not do.

      would this not work?

      Translate all that into perl code and see. If it doesn't work, post the perl code; if you don't know how to translate that, let us know where you're stuck.

      I think the code posted above by kenosis (at about the same time when about 10 minutes before you posted this question) should be a pretty good start, if not the full answer. It uses the "input record separator" to use "HDR" instead of new-line.

      On the first read, it'll just get "HDR", and output nothing. On each subsequent read, it will get a whole record (including the next occurrence of the string "HDR"), skip the first 47 characters (the rest of the header string), trim off the "FTR" and following text, and output just the remaining record content (including whatever line breaks it contains).