Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

File Parsing and Pattern Matching

by Mark.Allan (Sexton)
on Sep 05, 2013 at 18:41 UTC ( #1052612=perlquestion: print w/replies, xml ) Need Help??

Mark.Allan has asked for the wisdom of the Perl Monks concerning the following question:

I have a question for you monksters out there, I want to parse the followin file below. Now from the contents of this file I want to match the VALUEx string as the key field. So VALUE1 VALUE2 VALUE3 VALUE4 all need to be key match. Adding to this. I want to associate attributes (CAUSE, AFFECT) for each type whether they exist or not in the TYPE

// HEADER TAG // VERSION TAG TYPE VALUE1 EQUALS MAIN I am useless text CAUSE FAIL AFFECT ERROR ENDTYPE TYPE VALUE2 EQUALS MAIN I am useful test ENDTYPE TYPE VALUE3 EQUALS MAIN CAUSE DEGRADED ENDTYPE TYPE VALUE4 EQUALS MAIN AFFECT WARNING ENDTYPE

So the OUTPUT from the above file should resemble the following

(TYPE):(CAUSE),(EFFECT) VALUE1:FAIL,ERROR VALUE2:UNDEF,UNDEF VALUE3:DEGRADED,UNDEF VALUE4:UNDEF,WARNING

UNDEF meaning that particular attribute was not defined in the TYPE for that specific VALUE

Below is a snippet of associated code so far which is not working because I need to somehow create an array of attributes and associate them with the key Type

my %results; open(FH,"<$weblog") || die ("cannot open file"); while (<FH>){ next if ($_ =~ /^\//); next if (/^(\s)*$/); if (/^TYPE\s(\S+)\s\S+\s\S+/){ $key = $1;} if (/^CAUSE\s(\S+)/){ $cause = $1);} else{ $cause = "UNDEF";} if (/^AFFECT\s(\S+)/){ $affect = $1;} else{ $affect = "UNDEF";} $results{$key}= join "," => $cause => $affect; } for (sort keys %results) { print "$_:$results{$key}\n";} close (FH);

Replies are listed 'Best First'.
Re: File Parsing and Pattern Matching
by toolic (Bishop) on Sep 05, 2013 at 18:57 UTC
    A hash-of-hashes might help (perldsc):
    use warnings; use strict; my %results; my $type; while (<DATA>) { next if m{^/}; next unless /\S/; if (/^TYPE\s(\S+)/) { $type = $1; $results{$type}{CAUSE } = 'UNDEF'; $results{$type}{AFFECT} = 'UNDEF'; } if (/^(CAUSE|AFFECT)\s(\S+)/) { $results{$type}{$1} = $2; } } for (sort keys %results) { print "$_:$results{$_}{CAUSE},$results{$_}{AFFECT}\n"; } __DATA__ // HEADER TAG // VERSION TAG TYPE VALUE1 EQUALS MAIN I am useless text CAUSE FAIL AFFECT ERROR ENDTYPE TYPE VALUE2 EQUALS MAIN I am useful test ENDTYPE TYPE VALUE3 EQUALS MAIN CAUSE DEGRADED ENDTYPE TYPE VALUE4 EQUALS MAIN AFFECT WARNING ENDTYPE

    prints:

    VALUE1:FAIL,ERROR VALUE2:UNDEF,UNDEF VALUE3:DEGRADED,UNDEF VALUE4:UNDEF,WARNING
      Grammar Nazi version:
      use warnings; use strict; my %results; my $type; while (<DATA>) { next if m{^/}; next unless /\S/; if (/^TYPE\s(\S+)/) { $type = $1; $results{$type}{CAUSE } = 'UNDEF'; $results{$type}{EFFECT} = 'UNDEF'; } if (/^(CAUSE|EFFECT)\s(\S+)/) { $results{$type}{$1} = $2; } } for (sort keys %results) { print "$_:$results{$_}{CAUSE},$results{$_}{EFFECT}\n"; } __DATA__ // HEADER TAG // VERSION TAG TYPE VALUE1 EQUALS MAIN I am useless text CAUSE FAIL EFFECT ERROR ENDTYPE TYPE VALUE2 EQUALS MAIN I am useful test ENDTYPE TYPE VALUE3 EQUALS MAIN CAUSE DEGRADED ENDTYPE TYPE VALUE4 EQUALS MAIN EFFECT WARNING ENDTYPE
Re: File Parsing and Pattern Matching
by kcott (Bishop) on Sep 05, 2013 at 22:54 UTC

    G'day Mark.Allan,

    Here's a solution that uses named capture buffers; as such, it requires Perl 5.10.0 or later.

    #!/usr/bin/env perl use 5.010; use strict; use warnings; my $re = qr{ (?> ^TYPE \s+ (?<value>VALUE\d+) | ^CAUSE \s+ (?<cause>\w+) | ^[AE]FFECT \s+ (?<effect>\w+) ) }mx; { local $/ = "ENDTYPE\n"; while (<DATA>) { my %type = (value => '', cause => 'UNDEF', effect => 'UNDEF'); @type{keys %+} = values %+ while /$re/g; say "$type{value}:$type{cause},$type{effect}"; } } __DATA__ // HEADER TAG // VERSION TAG TYPE VALUE1 EQUALS MAIN I am useless text CAUSE FAIL AFFECT ERROR ENDTYPE TYPE VALUE2 EQUALS MAIN I am useful test ENDTYPE TYPE VALUE3 EQUALS MAIN CAUSE DEGRADED ENDTYPE TYPE VALUE4 EQUALS MAIN AFFECT WARNING ENDTYPE TYPE VALUE5 EQUALS MAIN This comes after TYPE VALUE4 Strange effect before cause EFFECT EFFECT_FIRST CAUSE CAUSE_SECOND ENDTYPE

    Output:

    $ pm_multiline_parse.pl VALUE1:FAIL,ERROR VALUE2:UNDEF,UNDEF VALUE3:DEGRADED,UNDEF VALUE4:UNDEF,WARNING VALUE5:CAUSE_SECOND,EFFECT_FIRST

    Notes:

    • This solution changes the concept of a record from the default (single line ending in a newline) to potentially many lines ending in "ENDTYPE\n" by modifying "$/" (see perlvar). This allows you to deal with each "TYPE ... ENDTYPE" block as a discrete entity.
    • Your sample input has a spelling mistake (i.e. AFFECT) which is corrected in your wanted output (i.e. EFFECT). In case your real data also contains that mistake, I've allowed for it in the regex (... [AE]FFECT ...).
    • I've added an addional "TYPE ... ENDTYPE" block to show that: the order of CAUSE and EFFECT lines don't matter; correctly spelled EFFECT in the input is handled correctly; and bogus VALUE, CAUSE and EFFECT text is ignored.
    • This solution is based on the sample input and the wanted output that you've shown. Note that %type is recreated on each iteration of the outer while loop; you may want to capture its data if you need to use it elsewhere in your code.

    Finally, your sort will fail if your TYPEs exceed VALUE9. This short piece of code highlights the mistake you're making and how to fix it:

    $ perl -Mstrict -Mwarnings -E ' my @x = qw{VALUE10 VALUE2 VALUE1}; say "*** Sorting Mistake ***"; say for sort @x; say "*** Sorting Correctly ***"; say for sort { substr($a, 5) <=> substr($b, 5) } @x; ' *** Sorting Mistake *** VALUE1 VALUE10 VALUE2 *** Sorting Correctly *** VALUE1 VALUE2 VALUE10

    -- Ken

Re: File Parsing and Pattern Matching
by johngg (Canon) on Sep 05, 2013 at 22:18 UTC

    Reading records in paragraph mode rather than line by line and pulling out all the information using a regex with look-aheads with the 0 or 1 quantifier.

    use strict; use warnings; use 5.014; use Data::Dumper; open my $inFH, q{<}, \ <<EOD or die $!; // HEADER TAG // VERSION TAG TYPE VALUE1 EQUALS MAIN I am useless text CAUSE FAIL EFFECT ERROR ENDTYPE TYPE VALUE2 EQUALS MAIN I am useful test ENDTYPE TYPE VALUE3 EQUALS MAIN CAUSE DEGRADED ENDTYPE TYPE VALUE4 EQUALS MAIN EFFECT WARNING ENDTYPE EOD my $rxExtract = qr {(?xs) TYPE\s ( \S+ ) (?= .* (?: CAUSE\s ( \S+ ) ) )? (?= .* (?: EFFECT\s ( \S+ ) ) )? }; my %results; { local $/ = q{}; scalar <$inFH>; while ( <$inFH> ) { next unless m{$rxExtract}; $results{ $1 } = { CAUSE => defined $2 ? $2 : q{UNDEF}, EFFECT => defined $3 ? $3 : q{UNDEF}, }; } } say qq{$_:$results{ $_ }->{ CAUSE },$results{ $_ }->{ EFFECT }} for sort keys %results; print qq{\n}; print Data::Dumper ->new( [ \ %results ], [ qw{ *results } ] ) ->Sortkeys( 1 ) ->Dumpxs();

    The results.

    VALUE1:FAIL,ERROR VALUE2:UNDEF,UNDEF VALUE3:DEGRADED,UNDEF VALUE4:UNDEF,WARNING %results = ( 'VALUE1' => { 'CAUSE' => 'FAIL', 'EFFECT' => 'ERROR' }, 'VALUE2' => { 'CAUSE' => 'UNDEF', 'EFFECT' => 'UNDEF' }, 'VALUE3' => { 'CAUSE' => 'DEGRADED', 'EFFECT' => 'UNDEF' }, 'VALUE4' => { 'CAUSE' => 'UNDEF', 'EFFECT' => 'WARNING' } );

    I hope this is of interest.

    Cheers,

    JohnGG

Re: File Parsing and Pattern Matching
by hdb (Monsignor) on Sep 06, 2013 at 09:25 UTC

    Here is a short version. Thanks to everyone who helped me: Unexpected matching results.

    use strict; use warnings; my @results = do { local $/ = ''; # paragraph mode map { my $p=$_; [ map { $p=~/$_ (\S+)/?$1:'UNDEF' } qw/TYPE CA +USE AFFECT/ ] } grep { /TYPE/ } <DATA>; }; print "@$_\n" for @results; __DATA__ // HEADER TAG // VERSION TAG TYPE VALUE1 EQUALS MAIN I am useless text CAUSE FAIL AFFECT ERROR ENDTYPE TYPE VALUE2 EQUALS MAIN I am useful test ENDTYPE TYPE VALUE3 EQUALS MAIN CAUSE DEGRADED ENDTYPE TYPE VALUE4 EQUALS MAIN AFFECT WARNING ENDTYPE

      Some very helpful answers! Thank you everyonw who contributed

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1052612]
Approved by sundialsvc4
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others pondering the Monastery: (2)
As of 2019-06-16 10:48 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Is there a future for codeless software?



    Results (76 votes). Check out past polls.

    Notices?