Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

Re: File Parsing and Pattern Matching

by kcott (Archbishop)
on Sep 05, 2013 at 22:54 UTC ( [id://1052633]=note: print w/replies, xml ) Need Help??


in reply to File Parsing and Pattern Matching

G'day Mark.Allan,

Here's a solution that uses named capture buffers; as such, it requires Perl 5.10.0 or later.

#!/usr/bin/env perl use 5.010; use strict; use warnings; my $re = qr{ (?> ^TYPE \s+ (?<value>VALUE\d+) | ^CAUSE \s+ (?<cause>\w+) | ^[AE]FFECT \s+ (?<effect>\w+) ) }mx; { local $/ = "ENDTYPE\n"; while (<DATA>) { my %type = (value => '', cause => 'UNDEF', effect => 'UNDEF'); @type{keys %+} = values %+ while /$re/g; say "$type{value}:$type{cause},$type{effect}"; } } __DATA__ // HEADER TAG // VERSION TAG TYPE VALUE1 EQUALS MAIN I am useless text CAUSE FAIL AFFECT ERROR ENDTYPE TYPE VALUE2 EQUALS MAIN I am useful test ENDTYPE TYPE VALUE3 EQUALS MAIN CAUSE DEGRADED ENDTYPE TYPE VALUE4 EQUALS MAIN AFFECT WARNING ENDTYPE TYPE VALUE5 EQUALS MAIN This comes after TYPE VALUE4 Strange effect before cause EFFECT EFFECT_FIRST CAUSE CAUSE_SECOND ENDTYPE

Output:

$ pm_multiline_parse.pl VALUE1:FAIL,ERROR VALUE2:UNDEF,UNDEF VALUE3:DEGRADED,UNDEF VALUE4:UNDEF,WARNING VALUE5:CAUSE_SECOND,EFFECT_FIRST

Notes:

  • This solution changes the concept of a record from the default (single line ending in a newline) to potentially many lines ending in "ENDTYPE\n" by modifying "$/" (see perlvar). This allows you to deal with each "TYPE ... ENDTYPE" block as a discrete entity.
  • Your sample input has a spelling mistake (i.e. AFFECT) which is corrected in your wanted output (i.e. EFFECT). In case your real data also contains that mistake, I've allowed for it in the regex (... [AE]FFECT ...).
  • I've added an addional "TYPE ... ENDTYPE" block to show that: the order of CAUSE and EFFECT lines don't matter; correctly spelled EFFECT in the input is handled correctly; and bogus VALUE, CAUSE and EFFECT text is ignored.
  • This solution is based on the sample input and the wanted output that you've shown. Note that %type is recreated on each iteration of the outer while loop; you may want to capture its data if you need to use it elsewhere in your code.

Finally, your sort will fail if your TYPEs exceed VALUE9. This short piece of code highlights the mistake you're making and how to fix it:

$ perl -Mstrict -Mwarnings -E ' my @x = qw{VALUE10 VALUE2 VALUE1}; say "*** Sorting Mistake ***"; say for sort @x; say "*** Sorting Correctly ***"; say for sort { substr($a, 5) <=> substr($b, 5) } @x; ' *** Sorting Mistake *** VALUE1 VALUE10 VALUE2 *** Sorting Correctly *** VALUE1 VALUE2 VALUE10

-- Ken

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1052633]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others examining the Monastery: (5)
As of 2024-04-23 15:28 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found