This may do what you want. It uses a positive look-behind assertion so that the pattern splits on multiple newlines as long as thet are preceded by a newline. There ia also a look behind to cope with bank lines at the start of your data in case you need that. If you do, there will be an empty record at the start of your array that you can shift away.
use strict;
use warnings;
my $startOffset = tell DATA;
my $recordCt;
my @records = do
{
local $/;
split m{(?x) (?: (?<=\A) | (?<=\n) ) (\n+) }, <DATA>;
};
$recordCt = 0;
print qq{@{ [ sprintf qq{\n%2d >}, ++ $recordCt ] }$_<} for @records;
__END__
Record 1 consists of
three lines
of data terminated by a newline
Record 2 has
just two lines terminated by a newline
Record 3 contains four lines
of data which
is very important
to the project, newline terminated again
Record 4 was preceded by two blank lines
and has two lines terminated by a newline
The output.
1 >Record 1 consists of
three lines
of data terminated by a newline
<
2 >
<
3 >Record 2 has
just two lines terminated by a newline
<
4 >
<
5 >Record 3 contains four lines
of data which
is very important
to the project, newline terminated again
<
6 >
<
7 >Record 4 was preceded by two blank lines
and has two lines terminated by a newline
<
Note that records 2, 4 and 6 are the newlines we preserved with the regex captures and each record finishes with a newline as you can see from the positions of the '<'s.
I hope this is helpful.
|