Re: Parsing a file with parentheses to build a hash
by RichardK (Parson) on Nov 13, 2014 at 16:29 UTC
|
It looks like a simple grammar so I'd use a recursive descent parser, Parse::RecDescent for one.
This is just a example off the top of my head and completely untested.
start : '(' section(s) ')'
section : '(' id entry(s) ')'
entry : '(' section | keypair ')'
keypair : '(' key value ')'
Update
Looking at this again, entry shouldn't have those terminal brackets -- it would only parse sections like '(id ((key value))', so just
entry : section | keypair
(I did say this was just off the top of my head !) | [reply] [d/l] [select] |
|
Oh, quick question. Will I need to barf the entire file into a single var or is there another way to do this?
Ex:
#!/path_to_perl
use Parse::RecDescent;
$::RD_ERRORS = 1; #Parser dies when it encounters an error
$::RD_WARN = 1; #Enable warnings - warn on unused rules &c.
$::RD_HINT = 1; # Give out hints to help fix problems.
my $grammar = <<'END_OF_GRAMMAR';
# What you said...
start : '(' section(s) ')'
section : '(' id entry(s) ')'
entry : '(' section | keypair ')'
keypair : '(' key value ')'
END_OF_GRAMMAR
my $text;
while (<BRD>)
{
my $new_line = $_;
chomp;
$text = "$text $new_line";
}
my $parser = Parse::RecDescent->new($grammar) or die "Ha Ha, something
+ is wrong with the syntax, good luck finding the issue!\n";
defined $parser->section($text) or die "It helps if there is a section
+ to find...";
I'm just writing, I havent tested anything yet, so I'm sure there are a few issues still in my understanding:)
Starting to read Why won't this basic Parse::RecDescent example work?
| [reply] [d/l] |
|
use File::Slurp;
my $text = read_file( $filename );
| [reply] [d/l] |
|
|
|
Very interesting approach. Seems that I just need to setup the grammer according to the specification and all is done. I so like making things more difficult than this:) I will have to play with this just to see where it, or I, falls apart.
Great advice/pointer!
| [reply] |
|
So, I'm playing with this and it looks like I'm going to use regular expression to break out some of the unique grammer(s) so I can write grammers for each section since they don't all follow the same rules... I'm not sure how to put the info into a hash and I'm getting an error that I don't know how to fix:
#!/usr/bin/perl
use Parse::RecDescent;
use Data::Dumper;
$::RD_ERRORS = 1; #Parser dies when it encounters an error
$::RD_WARN = 1; #Enable warnings - warn on unused rules &c.
$::RD_HINT = 1; # Give out hints to help fix problems.
our %DATA;
#my $module_grammar = <<'END_OF_MODULE_GRAMMAR';
#start : start_module
#start_module : '(module ' name module_values
#module_values : '(' section | keypair ')'
#section : '(' fp_text
#keypair : '(' key value ')'
#END_OF_MODULE_GRAMMAR
my $net_grammer = <<'END_OF_NET_GRAMMER';
start : nets(s)
nets : '(net ' node name ')'
{
$main::DATA{"NET"}{$item{'node'}} = $item{'name'};
print "$_\n" for @item{'node'};
}
node : m/\d+/
name : m/\S+/
END_OF_NET_GRAMMER
my $text;
open ( BRD, "<attiny84ap.kicad_pcb") or die("ha ha");
while (<BRD>)
{
my $new_line = $_;
chomp;
$text = "$text $new_line";
}
$text =~ s/\s+\(/\(/g; # remove white space in front of '('
$text =~ s/\(\s+/\(/g; # remove white space after '('
$text =~ s/\s+\)/\)/g; # remove white space in front of ')'
$text =~ s/\)s+/\)/g; # remove white space after ')'
# This takes tooooo long silly
#$text =~ m/\(kicad_pcb\(version (\d+)\)\(host pcbnew \"(.*)\"\)\(gene
+ral(.*)\)\(page (\S+)\)\(layers(.*)\)\(setup(.*)\)(\(net .*\))\(net_c
+lass/;
#my $version = $1;
#my $pcbnew_revision = $2;
#my $section_general = $3;
#my $page = $4;
#my $section_layers = $5;
#my $section_setup = $6;
#my $netlist_map = $7;
$text =~ m/^\(kicad_pcb\(version (\d+)\)(\(.*\))\)$/;
my $version = $1;
$text = $2;
#print "D - $version\n";
$text =~ m/^\(host pcbnew \"(.*)\"\)(\(general.*\))$/;
my $pcbnew_revision = $1;
$text = $2;
#print "D - $pcbnew_revision\n";
$text =~ m/^\(general(.*)\)(\(page .*\))$/;
my $section_general = $1;
$text = $2;
$section_general =~ s/\)\(/\)\n\(/g; # put newline between ')('
#print "D - $section_general\n";
$text =~ m/^\(page (\S+)\)(\(layers.*\))$/;
my $page = $1;
$text = $2;
#print "D - $page\n";
$text =~ m/^\(layers(\(.*\))\)(\(setup.*\))$/;
my $section_layers = $1;
$text = $2;
$section_layers =~ s/\)\(/\)\n\(/g; # put newline between ')('
#print "D - $section_layers\n";
$text =~ m/^\(setup(.*\)\))\)(\(net .*\))$/;
my $section_setup = $1;
$text = $2;
$section_setup =~ s/\)\(/\)\n\(/g; # put newline between ')('
#print "D - $section_setup\n";
$text =~ m/^(\(net .*\))(\(net_class.*)$/;
my $netlist_map = $1;
$text = $2;
$netlist_map =~ s/\)\(/\)\n\(/g; # put newline between ')('
print "D - $netlist_map\n";
my $parser = Parse::RecDescent->new($net_grammar) or die "Bad grammar!
+\n";
defined $parser->start($netlist_map) or die "Text doesn't match";
foreach my $KEY (keys %($DATA{"NET"}))
{
print "$KEY\n";
}
exit;
The file that its calling is in another thread, once I figure out how to link it here I will update this thread.
Here is the output I'm getting:
>./parse_kicad_pcb.pl
D - (net 0 "")
(net 1 +9V)
(net 2 /CLK)
(net 3 /DO)
(net 4 /Data_In)
(net 5 /SCL)
(net 6 /SDA)
(net 7 /SET_Horz)
(net 8 /SET_Vert)
(net 9 /~Horz_ON)
(net 10 /~RESET)
(net 11 /~Vert_ON)
(net 12 5V_ATTINY84P)
(net 13 GND)
(net 14 N-000001)
(net 15 N-0000018)
(net 16 N-0000019)
(net 17 N-000002)
(net 18 N-0000021)
(net 19 N-0000024)
(net 20 N-0000026)
(net 21 N-0000027)
(net 22 N-0000028)
(net 23 N-0000029)
(net 24 N-000003)
(net 25 N-0000030)
(net 26 N-0000031)
(net 27 N-0000032)
(net 28 N-0000034)
(net 29 N-0000036)
(net 30 N-0000037)
(net 31 N-0000038)
(net 32 N-0000039)
(net 33 N-0000040)
(net 34 N-0000041)
(net 35 N-0000042)
(net 36 N-0000043)
(net 37 N-0000044)
(net 38 N-0000045)
(net 39 N-0000046)
(net 40 N-0000047)
(net 41 N-0000048)
(net 42 N-0000049)
(net 43 N-0000050)
(net 44 N-000009)
Unknown starting rule (Parse::RecDescent::namespace000001::start) call
+ed
at ./parse_kicad_pcb.pl line 93.
Any clues that might help? Working on this in 30min chunks isn't helping either, silly other work that needs attention.... | [reply] [d/l] [select] |
|
Well, that fragment doesn't compile, $net_grammar doesn't exist. so you didn't get that error message from there ;)
Using strict and warnings will help spot lots of these problems.
But, in general you're trying too hard, you've got to let the parser do what it's best at. It will handle all of that whitespace for you. Just (!) extend you grammar to handle all of the different sections and lets the parser do it's thing.
Maybe you need to review your parser theory and try out some simple examples first to get the hang of how RecDecent works. It is big and complex, and I don't use it often enough to keep all of the details in my head. I always have to experiment around a bit to get it to do what I want.
Have Kicad published a formal grammar for their file format? It's worth a look anyway.
| [reply] |
Re: Parsing a file with parentheses to build a hash
by toolic (Bishop) on Nov 13, 2014 at 16:19 UTC
|
| [reply] |
Re: Parsing a file with parentheses to build a hash
by choroba (Cardinal) on Nov 14, 2014 at 09:57 UTC
|
You can use the Marpa parser:
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
use Marpa::R2;
my $g = << '__G__';
lexeme default = latm => 1
:start ::= Hash
:default ::= action => itself
Hash ::= '(' Pairs ')' action => hash
Pairs ::= Pair+ action => pairs
Pair ::= '(' Key Value ')' action => pair
Key ::= String
Value ::= String
| Pairs
String ~ [^\s()]+
whitespace ~ [\s]+
:discard ~ whitespace
__G__
my $grammar = 'Marpa::R2::Scanless::G'->new({ source => \$g });
my $input = '(
(SECTION (Section_KEY Value1)
(Another_KEY1 Value2) (KEY2 value3)
(KEY3 Value4))
(NEW_SECTION
(SUB_SECTION
(KEY4 Value5))
(NEW_SUB_SECTION (KEY5 Value6)
)
)
)';
my $recce = 'Marpa::R2::Scanless::R'->new({ grammar => $gram
+mar,
semantics_package => 'main
+',
});
$recce->read(\$input);
print Dumper $recce->value;
sub hash { $_[2] }
sub pairs { shift; +{ map @$_, @_ } }
sub pair { [ @_[2, 3] ] }
sub itself { $_[1] }
Output:
$VAR1 = \{
'SECTION' => {
'Section_KEY' => 'Value1',
'KEY2' => 'value3',
'Another_KEY1' => 'Value2',
'KEY3' => 'Value4'
},
'NEW_SECTION' => {
'NEW_SUB_SECTION' => {
'KEY5' => 'Value
+6'
},
'SUB_SECTION' => {
'KEY4' => 'Value5'
}
}
};
| [reply] [d/l] [select] |
|
I can see you have used this module more than once. It looks like it uses the Parse::RecDescent module. I'm going to read up on this one now. I've started to poke into the file structure now and see that it's not as cut and dry as I expected... I started to remember this as I reviewed my old code... ugggly. Anyway, I realized that I need to set up some matches for different keys because they actually have implied vars based on the key name. I'm guessing the module can handel this. I really want to learn this methodology since it seems much quicker to setup once I get my head wrapped around it. I'm going to drop a bit more of a real example.
| [reply] [d/l] |
|
| [reply] [d/l] |
Re: Parsing a file with parentheses to build a hash
by Anonymous Monk on Nov 13, 2014 at 19:29 UTC
|
| [reply] |
|
This is more like the way I do things, for better or worse. I'm going to give the recdescent module a go, and likely start a parallel approach like the one in the String Search link.
The KiCad guys are Python'ers (I still like perl more) and role there eyes when I mention perl:) But I like the software so either I start to actively contribute via C++ (only 24h in a day, 30 lines of code for every line of perl, and 4 kids = no go) or continue to hack down a different path...
| [reply] |
|
This is more like the way I do things, for better or worse. Here is more of the same and some more advanced way :) Re: Count Quoted Words, Re^2: POD style regex for inline HTML elements, marpa scanless, Re: print output from the text file. (marpa scanless dsl calculator), Re^2: Help with regular expression ( m/\G/gc ), JSON parser as a single Perl Regex, Re^2: Help with regular expression, perlfaq6#What good is \G in a regular expression?, RFC: A walkthrough from JSON ABNF to Regexp::Grammars,Re^2: parsing XML fragments (xml log files) with... a regex
| [reply] |