http://www.perlmonks.org?node_id=1052630


in reply to File Parsing and Pattern Matching

Reading records in paragraph mode rather than line by line and pulling out all the information using a regex with look-aheads with the 0 or 1 quantifier.

use strict; use warnings; use 5.014; use Data::Dumper; open my $inFH, q{<}, \ <<EOD or die $!; // HEADER TAG // VERSION TAG TYPE VALUE1 EQUALS MAIN I am useless text CAUSE FAIL EFFECT ERROR ENDTYPE TYPE VALUE2 EQUALS MAIN I am useful test ENDTYPE TYPE VALUE3 EQUALS MAIN CAUSE DEGRADED ENDTYPE TYPE VALUE4 EQUALS MAIN EFFECT WARNING ENDTYPE EOD my $rxExtract = qr {(?xs) TYPE\s ( \S+ ) (?= .* (?: CAUSE\s ( \S+ ) ) )? (?= .* (?: EFFECT\s ( \S+ ) ) )? }; my %results; { local $/ = q{}; scalar <$inFH>; while ( <$inFH> ) { next unless m{$rxExtract}; $results{ $1 } = { CAUSE => defined $2 ? $2 : q{UNDEF}, EFFECT => defined $3 ? $3 : q{UNDEF}, }; } } say qq{$_:$results{ $_ }->{ CAUSE },$results{ $_ }->{ EFFECT }} for sort keys %results; print qq{\n}; print Data::Dumper ->new( [ \ %results ], [ qw{ *results } ] ) ->Sortkeys( 1 ) ->Dumpxs();

The results.

VALUE1:FAIL,ERROR VALUE2:UNDEF,UNDEF VALUE3:DEGRADED,UNDEF VALUE4:UNDEF,WARNING %results = ( 'VALUE1' => { 'CAUSE' => 'FAIL', 'EFFECT' => 'ERROR' }, 'VALUE2' => { 'CAUSE' => 'UNDEF', 'EFFECT' => 'UNDEF' }, 'VALUE3' => { 'CAUSE' => 'DEGRADED', 'EFFECT' => 'UNDEF' }, 'VALUE4' => { 'CAUSE' => 'UNDEF', 'EFFECT' => 'WARNING' } );

I hope this is of interest.

Cheers,

JohnGG