arthur_accioly has asked for the wisdom of the Perl Monks concerning the following question:
Hi, I'm trying to get a config file, change to a structure having id,parent,key,value (see example below), so then I can create sql commands to insert into an oracle table.
Example of config:
# This is a comment
feed_realtime_processor_pool = ( 11, 12 ) ;
dropout_detection_time_start = "17:00";
# Sometimes the config can have sub-structures
named_clients = (
{
name = "thread1";
user_threads = (
{ name = "realtime1"; cpu = 11; } # more comments
{ name = "realtime2"; cpu = 12; } # more comments
);
}
);
(...)
Converting:
id,parent, key, value
01,null, 'feed_realtime_processor_pool', '11'
02,null, 'feed_realtime_processor_pool', '12'
03,null, 'dropout_detection_time_start', '17:00'
04,null, 'named_clients', null
05,04, 'name', 'thread1'
06,04, 'user_threads', null
07,06, 'name', 'realtime1'
08,06, 'cpu', '11'
09,06, 'name', 'realtime2'
10,06, 'cpu', '12'
So, the question is:
1. Somebody knows some lib/module that can do this? I couldn't find.
2. If not, somebody can give me some suggestion about how should I start this script, maybe using different modules to help me parsing the config? To clean and parse the config is the harderst part for me.
Thanks.
Re: Config file to tree format with id, parent, key, value
by choroba (Cardinal) on Nov 01, 2016 at 22:51 UTC
|
If you can't find a module to parse your config format, write your own parser. Marpa::R2 can help you in the task:
#!/usr/bin/perl
use warnings;
use strict;
use feature qw{ say };
use Marpa::R2;
my $input = << '__INPUT__';
# This is a comment
feed_realtime_processor_pool = ( 11, 12 ) ;
dropout_detection_time_start = "17:00";
# Sometimes the config can have sub-structures
named_clients = (
{
name = "thread1";
user_threads = (
{ name = "realtime1"; cpu = 11; } # more comments
{ name = "realtime2"; cpu = 12; } # more comments
);
}
);
__INPUT__
my $dsl = << '__DSL__';
lexeme default = latm => 1
:default ::= action => ::first
Config ::= Elements
Elements ::= Element+ action => grep_def
+ined
Element ::= (Comment) action => empty
| Name (s eq s) Value action => [values]
| Name (s eq s) Value (semicolon s) action => [values]
Comment ::= (hash nonnl nl) action => empty
Name ::= alpha
Value ::= List
| String
| Num
| Struct
List ::= (lpar) Nums (rpar s semicolon s)
Nums ::= Num+ separator => comma action => listify
Num ::= (s) digits (s)
| (s) digits
| digits (s)
| digits
String ::= (qq) nqq (qq semicolon s) action => quote
Struct ::= (lpar s) InStructs (rpar semicolon s)
InStructs ::= InStruct+ action => grep_def
+ined
InStruct ::= (lcurl s) Elements (rcurl s)
| (Comment s) action => empty
| Element
s ~ [\s]*
eq ~ '='
hash ~ '#'
nonnl ~ [^\n]*
nl ~ [\n]
alpha ~ [a-z_]+
lpar ~ '('
rpar ~ ')'
lcurl ~ '{'
rcurl ~ '}'
semicolon ~ ';'
comma ~ ','
digits ~ [\d]+
qq ~ '"'
nqq ~ [^"]+
__DSL__
sub listify { shift; [ @_ ] }
sub quote { qq("$_[1]") }
sub empty {}
sub grep_defined { shift; [ grep defined, @_ ] }
my $id = 1;
sub show {
my ($parent, $name, $elems) = @_;
if (ref $elems->[0]) {
show($parent, $name, $_) for @$elems;
} elsif (ref $elems->[1]) {
if (ref $elems->[1][0]) {
say join ', ', $id, $parent, $elems->[0], 'null';
show($id++, $elems->[0], $elems->[1]);
} else {
for my $e (@{ $elems->[1] }) {
say join ', ', $id++, $parent, $elems->[0], $e;
}
}
} else {
say join ', ', $id++, $parent, @$elems;
}
}
my $grammar = 'Marpa::R2::Scanless::G'->new({ source => \$dsl });
show('null', q(), ${ $grammar->parse(\$input, 'main') });
Output: 1, null, feed_realtime_processor_pool, 11
2, null, feed_realtime_processor_pool, 12
3, null, dropout_detection_time_start, "17:00"
4, null, named_clients, null
5, 4, name, "thread1"
6, 4, user_threads, null
7, 6, name, "realtime1"
8, 6, cpu, 11
9, 6, name, "realtime2"
10, 6, cpu, 12
Note that the id of the last line is 10, not 9 as in your sample. Adding leading zeroes and aligning the columns left as an exercise for the OP.
($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord
}map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,
| [reply] [d/l] [select] |
|
Wow, it's working!!!! Thank you a lot. Just an addendum, to install this module I had to install a lot of extra dependencies that were not declared:
sudo cpan IPC::Cmd
sudo cpan Module::Build
sudo cpan Time::Piece
sudo cpan Marpa::R2
Thanks again! | [reply] [d/l] |
|
IPC::Cmd and Time::Piece have been in core since 5.9.5 which is very nearly a decade ago. Many dists don't mention core modules as dependencies, perhaps because the authors (erroneously) assume that everyone has the core installed by default. Module::Build used to be in core and it's possible that the author of Marpa::R2 did not notice when it was dropped from core.
Which version of Perl are you running and on which operating system?
| [reply] |
|
|
| [reply] |
|
Hi Choroba, can I ask you one more thing? Those config files can come in many different formats. I have two questions:
1. The order where the entries are coming is important? For example, if "dropout_detection_time_start" comes before "feed_realtime_processor_pool", is this going to break my parser?
2. If we have situations where some of the rules are not present in the config file, is this going to break the parser? For example, if I have "another_realtime_processor_pool", is this going to be discarded? Obviously I'll create a grammar for those extra parameters, but there are situations where the configs can have different entries depending of what the user wants to put in the config. Thanks.
| [reply] |
|
- The order shouldn't be important in this case. You can see that both "dropout_detection_time_start" and "feed_realtime_processor_pool" are instances of Element, that combine to Elements in any order ( Element+ ).
- Have you noticed that the parser is universal? It doesn't look for "realtime_processor" anywhere, it searches for Name , so you can define anything you like (however, at the same time, it means the parser doesn't catch typos - if you have the exhaustive list of possible Names available, replace the general alpha by something more specific).
($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord
}map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,
| [reply] [d/l] [select] |
Re: Config file to tree format with id, parent, key, value
by perlancar (Hermit) on Nov 01, 2016 at 21:22 UTC
|
I'm assuming you do not need that exact format. Do you really need the id and parent to point to line numbers? As long as you can get and set the node values, it should not matter to know the id/parent. There are several formats that can express nested mapping/sequence and support comments, e.g. YAML and TOML. If you are willing to forgo comments, there's JSON. And there's always Perl itself (you specify "configuration" as Perl code).
Somewhat related formats include Apachish (Config::Apachish::Reader) and Config::General. And, you know, XML.
If you want to preserve comments after writing back to configuration file, there are Config::IniFiles or Config::IOD, but these are INI-like. (IOD can store hashes and arrays too, though).
| [reply] |
|
Thanks for your answer. I don't want the comments, I actually want to get rid of them. I tested with Config::General and is interesting, but the problem is that when I have the sub-structures, the lib just put everything together and I can't split or find the parents of each key-value...
That's my test-code:
#!/usr/bin/perl
use warnings;
use strict;
use Config::General;
my $conf = Config::General->new("fh.cfg");
my %config = $conf->getall;
my @myarray = ();
while( my( $key, $value ) = each %config ){
print "$key: $value\n";
if (ref($value) eq 'ARRAY'){
@myarray = @$value;
foreach (@myarray) {
print "$_\n";
}
}
}
And here is the output repeated (because "request_snapshot" appears many times in my config):
(...)
status_interval: 30;
exchange_is_active_time_end: "16:00";
dropout_detection_initial_interval_secs: 600;
request_snapshot: ARRAY(0xce97a8)
false;
false;
false;
false;
(...)
| [reply] [d/l] [select] |
Re: Config file to tree format with id, parent, key, value
by stevieb (Canon) on Nov 01, 2016 at 23:35 UTC
|
When you cross-post to numerous sites, please consider the fact that many people don't visit them all, and it can result in wasted duplicate efforts. Let each site know up-front where else you've posted the same question.
Note also that you've updated your SO post, but not this one, so people here working wouldn't have any reference to any changes somewhere else without said notice.
| [reply] |
|
Good point. Guys, I tried posting on stackoverflow but here I had a better feedback. I'm trying to test the Marpa example (I'm having some issues to install this module, but still trying). The other post can be found on this link
| [reply] |
|
|