Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Config file to tree format with id, parent, key, value

by arthur_accioly (Initiate)
on Nov 01, 2016 at 19:36 UTC ( [id://1175079]=perlquestion: print w/replies, xml ) Need Help??

arthur_accioly has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I'm trying to get a config file, change to a structure having id,parent,key,value (see example below), so then I can create sql commands to insert into an oracle table.

Example of config:

# This is a comment feed_realtime_processor_pool = ( 11, 12 ) ; dropout_detection_time_start = "17:00"; # Sometimes the config can have sub-structures named_clients = ( { name = "thread1"; user_threads = ( { name = "realtime1"; cpu = 11; } # more comments { name = "realtime2"; cpu = 12; } # more comments ); } );
(...) Converting:
id,parent, key, value 01,null, 'feed_realtime_processor_pool', '11' 02,null, 'feed_realtime_processor_pool', '12' 03,null, 'dropout_detection_time_start', '17:00' 04,null, 'named_clients', null 05,04, 'name', 'thread1' 06,04, 'user_threads', null 07,06, 'name', 'realtime1' 08,06, 'cpu', '11' 09,06, 'name', 'realtime2' 10,06, 'cpu', '12'
So, the question is:

1. Somebody knows some lib/module that can do this? I couldn't find.

2. If not, somebody can give me some suggestion about how should I start this script, maybe using different modules to help me parsing the config? To clean and parse the config is the harderst part for me.

Thanks.

Replies are listed 'Best First'.
Re: Config file to tree format with id, parent, key, value
by choroba (Cardinal) on Nov 01, 2016 at 22:51 UTC
    If you can't find a module to parse your config format, write your own parser. Marpa::R2 can help you in the task:
    #!/usr/bin/perl use warnings; use strict; use feature qw{ say }; use Marpa::R2; my $input = << '__INPUT__'; # This is a comment feed_realtime_processor_pool = ( 11, 12 ) ; dropout_detection_time_start = "17:00"; # Sometimes the config can have sub-structures named_clients = ( { name = "thread1"; user_threads = ( { name = "realtime1"; cpu = 11; } # more comments { name = "realtime2"; cpu = 12; } # more comments ); } ); __INPUT__ my $dsl = << '__DSL__'; lexeme default = latm => 1 :default ::= action => ::first Config ::= Elements Elements ::= Element+ action => grep_def +ined Element ::= (Comment) action => empty | Name (s eq s) Value action => [values] | Name (s eq s) Value (semicolon s) action => [values] Comment ::= (hash nonnl nl) action => empty Name ::= alpha Value ::= List | String | Num | Struct List ::= (lpar) Nums (rpar s semicolon s) Nums ::= Num+ separator => comma action => listify Num ::= (s) digits (s) | (s) digits | digits (s) | digits String ::= (qq) nqq (qq semicolon s) action => quote Struct ::= (lpar s) InStructs (rpar semicolon s) InStructs ::= InStruct+ action => grep_def +ined InStruct ::= (lcurl s) Elements (rcurl s) | (Comment s) action => empty | Element s ~ [\s]* eq ~ '=' hash ~ '#' nonnl ~ [^\n]* nl ~ [\n] alpha ~ [a-z_]+ lpar ~ '(' rpar ~ ')' lcurl ~ '{' rcurl ~ '}' semicolon ~ ';' comma ~ ',' digits ~ [\d]+ qq ~ '"' nqq ~ [^"]+ __DSL__ sub listify { shift; [ @_ ] } sub quote { qq("$_[1]") } sub empty {} sub grep_defined { shift; [ grep defined, @_ ] } my $id = 1; sub show { my ($parent, $name, $elems) = @_; if (ref $elems->[0]) { show($parent, $name, $_) for @$elems; } elsif (ref $elems->[1]) { if (ref $elems->[1][0]) { say join ', ', $id, $parent, $elems->[0], 'null'; show($id++, $elems->[0], $elems->[1]); } else { for my $e (@{ $elems->[1] }) { say join ', ', $id++, $parent, $elems->[0], $e; } } } else { say join ', ', $id++, $parent, @$elems; } } my $grammar = 'Marpa::R2::Scanless::G'->new({ source => \$dsl }); show('null', q(), ${ $grammar->parse(\$input, 'main') });

    Output:

    1, null, feed_realtime_processor_pool, 11 2, null, feed_realtime_processor_pool, 12 3, null, dropout_detection_time_start, "17:00" 4, null, named_clients, null 5, 4, name, "thread1" 6, 4, user_threads, null 7, 6, name, "realtime1" 8, 6, cpu, 11 9, 6, name, "realtime2" 10, 6, cpu, 12

    Note that the id of the last line is 10, not 9 as in your sample. Adding leading zeroes and aligning the columns left as an exercise for the OP.

    ($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord }map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,
      Wow, it's working!!!! Thank you a lot. Just an addendum, to install this module I had to install a lot of extra dependencies that were not declared:
      sudo cpan IPC::Cmd sudo cpan Module::Build sudo cpan Time::Piece sudo cpan Marpa::R2
      Thanks again!

        IPC::Cmd and Time::Piece have been in core since 5.9.5 which is very nearly a decade ago. Many dists don't mention core modules as dependencies, perhaps because the authors (erroneously) assume that everyone has the core installed by default. Module::Build used to be in core and it's possible that the author of Marpa::R2 did not notice when it was dropped from core.

        Which version of Perl are you running and on which operating system?

      Hi Choroba, can I ask you one more thing? Those config files can come in many different formats. I have two questions: 1. The order where the entries are coming is important? For example, if "dropout_detection_time_start" comes before "feed_realtime_processor_pool", is this going to break my parser? 2. If we have situations where some of the rules are not present in the config file, is this going to break the parser? For example, if I have "another_realtime_processor_pool", is this going to be discarded? Obviously I'll create a grammar for those extra parameters, but there are situations where the configs can have different entries depending of what the user wants to put in the config. Thanks.
        1. The order shouldn't be important in this case. You can see that both "dropout_detection_time_start" and "feed_realtime_processor_pool" are instances of Element, that combine to Elements in any order ( Element+ ).

        2. Have you noticed that the parser is universal? It doesn't look for "realtime_processor" anywhere, it searches for Name , so you can define anything you like (however, at the same time, it means the parser doesn't catch typos - if you have the exhaustive list of possible Names available, replace the general alpha by something more specific).

        ($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord }map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,
Re: Config file to tree format with id, parent, key, value
by perlancar (Hermit) on Nov 01, 2016 at 21:22 UTC

    I'm assuming you do not need that exact format. Do you really need the id and parent to point to line numbers? As long as you can get and set the node values, it should not matter to know the id/parent. There are several formats that can express nested mapping/sequence and support comments, e.g. YAML and TOML. If you are willing to forgo comments, there's JSON. And there's always Perl itself (you specify "configuration" as Perl code).

    Somewhat related formats include Apachish (Config::Apachish::Reader) and Config::General. And, you know, XML.

    If you want to preserve comments after writing back to configuration file, there are Config::IniFiles or Config::IOD, but these are INI-like. (IOD can store hashes and arrays too, though).

      Thanks for your answer. I don't want the comments, I actually want to get rid of them. I tested with Config::General and is interesting, but the problem is that when I have the sub-structures, the lib just put everything together and I can't split or find the parents of each key-value... That's my test-code:
      #!/usr/bin/perl use warnings; use strict; use Config::General; my $conf = Config::General->new("fh.cfg"); my %config = $conf->getall; my @myarray = (); while( my( $key, $value ) = each %config ){ print "$key: $value\n"; if (ref($value) eq 'ARRAY'){ @myarray = @$value; foreach (@myarray) { print "$_\n"; } } }
      And here is the output repeated (because "request_snapshot" appears many times in my config):
      (...) status_interval: 30; exchange_is_active_time_end: "16:00"; dropout_detection_initial_interval_secs: 600; request_snapshot: ARRAY(0xce97a8) false; false; false; false; (...)
Re: Config file to tree format with id, parent, key, value
by stevieb (Canon) on Nov 01, 2016 at 23:35 UTC

    When you cross-post to numerous sites, please consider the fact that many people don't visit them all, and it can result in wasted duplicate efforts. Let each site know up-front where else you've posted the same question.

    Note also that you've updated your SO post, but not this one, so people here working wouldn't have any reference to any changes somewhere else without said notice.

      Good point. Guys, I tried posting on stackoverflow but here I had a better feedback. I'm trying to test the Marpa example (I'm having some issues to install this module, but still trying). The other post can be found on this link

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1175079]
Approved by GrandFather
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others about the Monastery: (3)
As of 2024-04-19 01:19 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found