TStanley has asked for the wisdom of the Perl Monks concerning the following question:
I have a large report that I need to extract data from. The report can be broken down into records, with the start of each one looking similar to what is below:
REPORT HEADER ISCDAYRECAP-001 ISC001 ISC RECAP REPORT FOR STORE: 001 + PAGE: 00 1 XTNDED MRKDWN F +OR STORE: 12.00 R U N DEPT: GROCERY POST DATE: 07 +/14/2011 DATE/TIME: 07/14/2011 21:11:05 EXTEND + MRKDWN REASON EXT. MRKDWN
I am using the following code to split the file out into the separate stores, but it splits out into two elements, with element 0 of the array being empty, and everything else within element 1:
#!/usr/bin/perl -w use strict; open my $IN,"<","QISC001" or die "Can not open QISC001: $!\n"; my @records; my $data = do{ local $/; <$IN>; }; @records = split m|(?<=\n)(?=REPORT HEADER ISCDAYRECAP-\d{3})|, $data; close $IN;
One thing that I noticed is that when viewing the input file in vi (I am doing this on a HP-UX system), there is a ^L character at the start of each store, with the exception of the first one, so my guess is that the first part of the regex is incorrect. As always, suggestions/hints are welcome.
TStanley
--------
People sleep peaceably in their beds at night only because rough men stand ready to do violence on their behalf. -- George Orwell
|
---|
Replies are listed 'Best First'. | |
---|---|
Re: Problem with a regex?
by Jim (Curate) on Jul 15, 2011 at 17:11 UTC | |
by TStanley (Canon) on Jul 15, 2011 at 18:22 UTC | |
Re: Problem with a regex?
by Anonymous Monk on Jul 15, 2011 at 17:21 UTC | |
by Jim (Curate) on Jul 15, 2011 at 17:52 UTC |
Back to
Seekers of Perl Wisdom