Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

Parsing and editing a configuration file

by eyepopslikeamosquito (Chancellor)
on Jun 23, 2019 at 02:35 UTC ( #11101749=perlquestion: print w/replies, xml ) Need Help??

eyepopslikeamosquito has asked for the wisdom of the Perl Monks concerning the following question:

Thrown a problem at work to edit a customer configuration file. Though I've whipped up something that seems to work, my Perl has become quite rusty, so I seek suggestions for better ways to do it.

Though the precise syntax of this configuration file is unknown, I've got an example configuration file, shown below. One approach is write a formal parser. The other extreme is to hack out some regexes. I've taken a middle ground, crudely parsing the VIEW records by matching with \G in harness with the /gc regex match modifier, then applying substitution regexes to modify each VIEW record.

To keep the sample code smallish, assume the goal is to change field values of "REGION" to "LOCATION" and "RUBBISH" to "TRASH" but only if the RECORD value in the VIEW is either "ABC" or "XYZ". The test program is shown below:

# ptest.pl # # Sample program to edit a configuration file. # This configuration file lacks a formal syntax specification; # the example configuration below is all I've got. # All fields should be treated case insensitively. # The goal is to change field values of: # "REGION" to "LOCATION" # "RUBBISH" to "TRASH" # but only if the RECORD value in the VIEW is either "ABC" or "XYZ". # If you run with: # perl ptest.pl >1.tmp 2>2.tmp # 1.tmp will contain original file, 2.tmp will contain the changed ver +sion. use strict; use warnings; my $s_in = <<'GROK'; # comment line VIEW View1 RECORD "ABC" FIELD ( FIELD "TYPEFROM" FIELD "REGION" FIELD "RUBBISH" ) INTERVAL 600 SECONDS END_VIEW VIEW View2 RECORD "HELLO" FIELD ( FIELD "TYPETO" FIELD "REGION" FIELD "RUBBISH" ) INTERVAL 700 SECONDS END_VIEW # random line 1 VIEW View3 RECORD "XYZ" FIELD ( FIELD "FLD1" FIELD "Region" # random line 2 FIELD "Rubbish" ) INTERVAL 800 SECONDS END_VIEW # random line 3 GROK # Ensure properly newline terminated substr( $s_in, -1 ) ne "\n" and $s_in .= "\n"; my @recs = ( 'ABC', 'XYZ' ); my %fldmap = ( REGION => 'LOCATION', RUBBISH => 'TRASH', ); my $fldstr = join '|', keys %fldmap; print $s_in; my $s_out; while (1) { # Extract VIEW ... END_VIEW block if ( $s_in =~ /\G(^[ \t]*\bVIEW\b.*?^[ \t]*\bEND_VIEW\b)/msgic ) { my $view = $1; # Check for matching RECORD in VIEW block if ( $view =~ /^[ \t]*\bRECORD\b[ \t]*"(.*?)"/mi ) { my $rec = $1; if ( length($rec) && grep( /^\Q$rec\E$/i, @recs ) ) { # Translate fields $view =~ s/(\bFIELD\b[ \t]*?")($fldstr)"/$1 . $fldmap{uc($ +2)} . '"'/gie; } } $s_out .= $view; } elsif ( $s_in =~ /\G(.*\n)/gc ) { $s_out .= $1; } else { last; } } warn $s_out;

Updated: Minor changes made to code: ensured input file properly newline terminated plus very minor tweaks to regex.

Replies are listed 'Best First'.
Re: Parsing and editing a configuration file
by holli (Abbot) on Jun 23, 2019 at 06:41 UTC
    One approach is write a formal parser. The other extreme is to hack out some regexes.
    How about both?

    From the docs:
    This technique makes it possible to use regexes to recognize complex, hierarchical--and even recursive--textual structures. The problem is that Perl 5.10 doesn't provide any support for extracting that hierarchical data into nested data structures. In other words, using Perl 5.10 you can match complex data, but not parse it into an internally useful form. An additional problem when using Perl 5.10 regexes to match complex data formats is that you have to make sure you remember to insert whitespace-matching constructs (such as \s*) at every possible position where the data might contain ignorable whitespace. This reduces the readability of such patterns, and increases the chance of errors (typically caused by overlooking a location where whitespace might appear). The Regexp::Grammars module solves both those problems.

    See Re: Store data into array by looping? for an example case similar to yours.


    holli

    You can lead your users to water, but alas, you cannot drown them.
Re: Parsing and editing a configuration file
by tybalt89 (Parson) on Jun 23, 2019 at 13:57 UTC
    #!/usr/bin/perl use strict; # https://perlmonks.org/?node_id=11101749 use warnings; # "the precise syntax of this configuration file is unknown" <-- FIXME my $s_in = <<'GROK'; # comment line VIEW View1 RECORD "ABC" FIELD ( FIELD "TYPEFROM" FIELD "REGION" FIELD "RUBBISH" ) INTERVAL 600 SECONDS END_VIEW VIEW View2 RECORD "HELLO" FIELD ( FIELD "TYPETO" FIELD "REGION" FIELD "RUBBISH" ) INTERVAL 700 SECONDS END_VIEW # random line 1 VIEW View3 RECORD "XYZ" FIELD ( FIELD "FLD1" FIELD "Region" # random line 2 FIELD "Rubbish" ) INTERVAL 800 SECONDS END_VIEW # random line 3 GROK print $s_in; print STDERR $s_in =~ s/^VIEW\b.*?^END_VIEW\b/ my $block = $&; $block =~ m{^\h*RECORD\h+"(ABC|XYZ)"}im ? $block =~ s!^\h*FIELD\s+\K"REGION"!"LOCATION"!gimr =~ s!^\h*FIELD\s+\K"RUBBISH"!"TRASH"!gimr : $block /gemsr;
Re: Parsing and editing a configuration file
by LanX (Archbishop) on Jun 23, 2019 at 17:02 UTC
    The main problem for me is that the

    > configuration file lacks a formal syntax specification

    IMHO it's too dangerous to be simply done with a fancy regex.

    You need validation to catch wrong assumptions and errors in the input.

    For instance it's unclear

    • if entries are unique?
    • if entries need to be ordered?
    • how many nesting levels are possible?
    • if indentation matters?
    • if "random lines" are #comments or "unknown" entries or just junk?
    • if there are errors which didn't show up yet
    • the quoting matters

    I'd personally go for maintainability over speed then.

    Like writing an iterator which returns each "VIEW" as a nested data structure.

    Like this you can find and report errors and create a tidied version.

    Actually it's not too difficult to define your own grammar as a DSL of subs

    my $iterator = block start => qr/^VIEW \s+ (?<name>\w+)/x , stop => qr/^END_VIEW/ , hash => sub { entry "RECORD"; group "FIELD", array => sub { entries "FIELD" }; entry "INTERVAL"; } ;

    group() would be a special case of block() with

    start => qr/^ \s* FIELD \s+ \( \s* $/x , stop => qr/^ \s* \) \s* $/x,

    Inside those DSL subs I'd use an nextline() iterator, which does single readlines and handles empty lines and comments and returns undef if the stop condition of the upper level is met.

    And you might consider using a q_entry() sub for automatic unquoting of "quoted" entries. hence you are free to dynamically adapt to unknown conditions.

    NB This is untested code, I started to implement it but don't wanna spend my free Sunday on it, sorry ;-)

    HTH! :)

    Update

    if you wanna go this way, I'll share more insights.

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    Wikisyntax for the Monastery FootballPerl is like chess, only without the dice

      my free Sunday

      That sounds really bad and doesn't forebode well. If an inherent property of a substantive is stripped from it for attachment as adjective, the property is prone to change or to go away.

      May your Sundays stay free.

      perl -le'print map{pack c,($-++?1:13)+ord}split//,ESEL'
Re: Parsing and editing a configuration file
by shmem (Chancellor) on Jun 23, 2019 at 15:04 UTC

    I'd just be lazy and use a flip-flop.

    #!/usr/bin/perl -n if(/RECORD\s+"(?:ABC|XYZ)"/ ... /END_VIEW/) { s/^\s*field\s+\K"region"/"LOCATION"/i; s/^\s*field\s+\K"rubbish"/"TRASH"/i; } print;

    Having the structure inline, I'd just split then into @lines and say for(@lines) { ... }

    More matching pairs can be introduced with nested if-blocks of the same form - if (m{...} ... m{...}) { ... }

    perl -le'print map{pack c,($-++?1:13)+ord}split//,ESEL'
Re: Parsing and editing a configuration file
by Anonymous Monk on Jun 28, 2019 at 07:13 UTC

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://11101749]
Approved by Athanasius
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others taking refuge in the Monastery: (4)
As of 2020-05-29 01:28 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    If programming languages were movie genres, Perl would be:















    Results (166 votes). Check out past polls.

    Notices?