G'day Mark.Allan,
Here's a solution that uses named capture buffers; as such, it requires Perl 5.10.0 or later.
#!/usr/bin/env perl
use 5.010;
use strict;
use warnings;
my $re = qr{ (?> ^TYPE \s+ (?<value>VALUE\d+)
| ^CAUSE \s+ (?<cause>\w+)
| ^[AE]FFECT \s+ (?<effect>\w+) ) }mx;
{
local $/ = "ENDTYPE\n";
while (<DATA>) {
my %type = (value => '', cause => 'UNDEF', effect => 'UNDEF');
@type{keys %+} = values %+ while /$re/g;
say "$type{value}:$type{cause},$type{effect}";
}
}
__DATA__
// HEADER TAG
// VERSION TAG
TYPE VALUE1 EQUALS MAIN
I am useless text
CAUSE FAIL
AFFECT ERROR
ENDTYPE
TYPE VALUE2 EQUALS MAIN
I am useful test
ENDTYPE
TYPE VALUE3 EQUALS MAIN
CAUSE DEGRADED
ENDTYPE
TYPE VALUE4 EQUALS MAIN
AFFECT WARNING
ENDTYPE
TYPE VALUE5 EQUALS MAIN
This comes after TYPE VALUE4
Strange effect before cause
EFFECT EFFECT_FIRST
CAUSE CAUSE_SECOND
ENDTYPE
Output:
$ pm_multiline_parse.pl
VALUE1:FAIL,ERROR
VALUE2:UNDEF,UNDEF
VALUE3:DEGRADED,UNDEF
VALUE4:UNDEF,WARNING
VALUE5:CAUSE_SECOND,EFFECT_FIRST
Notes:
-
This solution changes the concept of a record from the default (single line ending in a newline) to potentially many lines ending in "ENDTYPE\n" by modifying "$/" (see perlvar).
This allows you to deal with each "TYPE ... ENDTYPE" block as a discrete entity.
-
Your sample input has a spelling mistake (i.e. AFFECT) which is corrected in your wanted output (i.e. EFFECT).
In case your real data also contains that mistake, I've allowed for it in the regex (... [AE]FFECT ...).
-
I've added an addional "TYPE ... ENDTYPE" block to show that: the order of CAUSE and EFFECT lines don't matter; correctly spelled EFFECT in the input is handled correctly; and bogus VALUE, CAUSE and EFFECT text is ignored.
-
This solution is based on the sample input and the wanted output that you've shown.
Note that %type is recreated on each iteration of the outer while loop;
you may want to capture its data if you need to use it elsewhere in your code.
Finally, your sort will fail if your TYPEs exceed VALUE9.
This short piece of code highlights the mistake you're making and how to fix it:
$ perl -Mstrict -Mwarnings -E '
my @x = qw{VALUE10 VALUE2 VALUE1};
say "*** Sorting Mistake ***";
say for sort @x;
say "*** Sorting Correctly ***";
say for sort { substr($a, 5) <=> substr($b, 5) } @x;
'
*** Sorting Mistake ***
VALUE1
VALUE10
VALUE2
*** Sorting Correctly ***
VALUE1
VALUE2
VALUE10