Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

How to know where I am going wrong in writing the grammar

by PoorLuzer (Beadle)
on Apr 01, 2009 at 15:09 UTC ( #754731=perlquestion: print w/replies, xml ) Need Help??

PoorLuzer has asked for the wisdom of the Perl Monks concerning the following question:

I have data like

my $testData = <<'_EOGTESTA_'; RECORD #input_id 1210758171x001_0013 #output_id #input_type PTC #output_type PTC #addkey #source_id 01 #filename TTFILE01-0001-20080101000000 F ptc_record_length 00B6 F ptc_record_type B firstBlock F ptc_charging_end_time 20080604093721 F ptc_called_msrn_ton FF . F ptc_term_mcz_duration 060000 F ptc_term_mcz_change_direction . _EOGTESTA_
I tried the following code using RecDescent, but of course the grammar is wrong somewhere:

#!/usr/bin/perl -w use strict; use Parse::RecDescent; use Data::Dumper; # Enable warnings within the Parse::RecDescent module. $::RD_ERRORS = 1; # Make sure the parser dies when it encounters an er +ror $::RD_WARN = 1; # Enable warnings. This will warn on unused rules &c +. $::RD_HINT = 1; # Give out hints to help fix problems. my $grammar = <<'_EOGRAMMAR_'; RECORDSTART : /RECORD/ { print "\n[*] RECORDSTART"; } RECORDEND : /^\.$/ { print '\n[*] RECORDEND'; } fieldName : /[^ \t]+/ { print "\n[*] fieldName\n" } metaName : /[^ \t]+/ { print "\n[*] metaName\n" } # metaFieldValue: /^\.$/ # fieldValue: /^\.$/ # blockName : /^\.$/ # metaFieldValue: /.*$/ # fieldValue: /.*$/ # blockName : /.*$/ metaFieldValue: /.*/ { print "\n[*] metaFieldValue\n" } fieldValue: /.*/ { print "\n[*] fieldValue\n" } blockName : /.*/ { print "\n[*] blockName\n" } metaField : #/#/ metaName /[ \t]+/ metaFieldValue /#/ metaName metaFieldValue { print "\n[*] Got metafield named $me +taName" . $item{ metaName } . ' with value ' . $item{ metaFieldValue +} . "\n" } field : /^F[ \t]+/ fieldName /[ \t]+/ fieldValue { print '\n[*] Got field named ' . $item{ fieldName } +. ' with value ' . $item{ fieldValue } . '\n' } block : /^B/ blockName { print '\n[*] Got block named ' . $item{ blockName } +. ' with value ' . ':-P' . '\n' } recordBody : field(s) { print '\n[*] field(s)\n' } | block(s) { print '\n[*] block(s)\n' } | metaField(s) { print '\n[*] metaField(s)\n' } #startOfRecord: RECORDSTART recordBody(s /$/) RECORDEND startOfRecord: RECORDSTART recordBody RECORDEND | <error> _EOGRAMMAR_ #$skeletonPattern = "#input_type[ \t]*"; #my $metaFieldPattern = qr/[ \t]*#([^ \t]+)[ \t]+(.*)/o; # "#input_typ +e SCDR+", "#filename processed_01_20080616001403.cdr", etc #my $normalFieldPattern = qr/([ \t]*)([0-9]*)F[ \t]+([^ \t]+)[ \t]+([^ + \t\r\n]+)(.*)/; # "1F S_Diagnostic1 62" OR " F S_Diagnostic1 62" +OR " F S_Diagnostic1 62" are synonymous, etc print $testData, "\n\n"; my $parser = Parse::RecDescent->new($grammar); $parser->startOfRecord($testData) or die "Bad input!\n";

But all I get are unhelpful error messages like:

Variable "$errorprefix" is not available at C:/cpanfly/var/megalib/Par +se/RecDescent.pm line 2906. Use of uninitialized value $errorprefix in formline at C:/cpanfly/var/ +megalib/Parse/RecDescent.pm line 2850.

I am using ActivePerl 5.10 and the latest RecDescent from CPAN.

Any ideas how to get "trace" messages as and when the parsing gets done, or to get something more helpful? UPDATED Brought the grammar to something so that it at least runs - but I will come back to the original grammar later

Replies are listed 'Best First'.
Re: How to know where I am going wrong in writing the grammar
by jethro (Monsignor) on Apr 01, 2009 at 15:33 UTC

    This is the error message I'm getting, with perl 5.8.8 on linux:

    RECORD #input_id 1210758171x001_0013 #output_id #input_type PTC #output_type PTC #addkey #source_id 01 #filename TTFILE01-0001-20080101000000 F ptc_record_length 00B6 F ptc_record_type B firstBlock F ptc_charging_end_time 20080604093721 F ptc_called_msrn_ton FF . F ptc_term_mcz_duration 060000 F ptc_term_mcz_change_direction . Warning (line 27): Found an error marker (<error>) after an uncond +itional <error> (Hint: An unconditional <error> always causes the prod +uction containing it to immediately fail. An error mar +ker that follows an <error> will never be reached. Did y +ou mean to use <error?> instead?) ERROR (line 1): Invalid startOfRecord: Was expecting RECORDSTAR +T Bad input!

    So it might be a problem in connection with ActivePerl or your particular installation. Just a wild guess

    When I used RecDescent the last time I put only function calls into the parser code. The functions I put outside the parser into a different module so that I could get meaningful line numbers from the error messages.

Re: How to know where I am going wrong in writing the grammar
by ikegami (Pope) on Apr 01, 2009 at 15:39 UTC

    Looks like a bug in P::RD rather than a grammar problem, which is why the $::RD vars don't help. Could you provide a sample value for $testData?

    Update: Nevermind, it's there there but it was off the screen. Looking into it.

      It looks like it might be some interaction between Parse::RecDescent and ActivePerl. A quick google search for "parse::recdescent errorprefix" yielded a number of ppm build error logs and a couple of references to use utf8. It looks like the utf8 issue was fixed in an earlier release (of Parse::RecDescent), and it also doesn't look like the code is explicitly working with utf8.

      jethro showed that there's an issue with the grammar itself, so maybe PoorLuzer could try fixing up his grammar definition so that it passes, or whittling it down to the minimal case that causes the $errorprefix error.

        • Debian, 5.8.8, P::RD 1.94:

          Warning (line 27): Found an error marker (<error>) after an uncond +itional <error> (Hint: An unconditional <error> always causes the prod +uction containing it to immediately fail. An error mar +ker that follows an <error> will never be reached. Did y +ou mean to use <error?> instead?) ERROR (line 1): Invalid startOfRecord: Was expecting RECORDSTAR +T
        • WinXP, Active Perl 5.10.0 build 5001, P::RD 1.94: (repeated 3 times with different line numbers)

          Variable "$errortext" is not available at c:/progs/perl5100/site/lib/P +arse/RecDescent.pm line 2917. Variable "$errorprefix" is not available at c:/progs/perl5100/site/lib +/Parse/RecDescent.pm line 2917. Use of uninitialized value $errorprefix in formline at c:/progs/perl51 +00/site/lib/Parse/RecDescent.pm line 2850. Use of uninitialized value $errortext in formline at c:/progs/perl5100 +/site/lib/Parse/RecDescent.pm line 2850. Use of uninitialized value $errortext in formline at c:/progs/perl5100 +/site/lib/Parse/RecDescent.pm line 2852. :

          Repeated twice with different line numbers.

        • WinXP, Active Perl 5.10.0 build 5004, P::RD 1.94:

          same.

        • WinXP, Active Perl 5.10.0 build 5004, with P::RD 1.94 installed from CPAN instead of PPM:

          same.

        • WinXP, Active Perl 5.8.8, P::RD 1.94:

          Warning (line 27): Found an error marker (<error>) after an uncond +itional <error> (Hint: An unconditional <error> always causes the prod +uction containing it to immediately fail. An error mar +ker that follows an <error> will never be reached. Did y +ou mean to use <error?> instead?) ERROR (line 1): Invalid startOfRecord: Was expecting RECORDSTAR +T

        Looks like an incompatibility with 5.10.0.

        I guess I'll have to look at the actual code, but it'll have to wait.

        Update: I've never seen that error message before, so here's what diagnostics has to say about it:

        (W closure) During compilation, an inner named subroutine or eval +is attempting to capture an outer lexical that is not currently avail +able. This can happen for one of two reasons. First, the outer lexical m +ay be declared in an outer anonymous subroutine that has not yet been cr +eated. (Remember that named subs are created at compile time, while anony +mous subs are created at run-time.) For example, sub { my $a; sub f { $a } } At the time that f is created, it can't capture the current value +of $a, since the anonymous subroutine hasn't been created yet. Conversely +, the following won't give a warning since the anonymous subroutine +has by now been created and is live: sub { my $a; eval 'sub f { $a }' }->(); The second situation is caused by an eval accessing a variable tha +t has gone out of scope, for example, sub f { my $a; sub { eval '$a' } } f()->(); Here, when the '$a' in the eval is being compiled, f() is not curr +ently being executed, so its $a is not available for capture.
        Here is some reduced grammer and code that works - parses the metafields (prefix #) correctly, but bums out at the "normal" fields (prefix F)..

        Any ideas?

        #!/usr/bin/perl -w use strict; use warnings; #use diagnostics; use Parse::RecDescent; use Data::Dumper; # Enable warnings within the Parse::RecDescent module. $::RD_ERRORS = 1; # Make sure the parser dies when it encounters an er +ror $::RD_WARN = 1; # Enable warnings. This will warn on unused rules &c +. $::RD_HINT = 1; # Give out hints to help fix problems. #$::AUTOSTUB = 1; my $grammar = <<'_EOGRAMMAR_'; #{ our $errortext = ''; our $errorprefix = '';} RECORDSTART : /^RECORD\r*\n/ { print "\n[*] RECORDSTART -> " . $item[1]; $item[1]; } RECORDEND : /^\./ { print '\n[*] RECORDEND -> ' . $item[1]; $item[1]; } fieldName : /[^ \t\n]+/ { print "\n[*] fieldName -> $item[1]\n"; $item[1]; } metaName : /[^ \t\n]+\n?/ { $item[1]; } metaFieldValue: /([^\n]*)\n/ { $1; } fieldValue: /([^\n]*)\n/ { print "[*] fieldValue $item[1]\n"; $1; } field : /^F/ fieldName fieldValue { print "[*] Got field named \'" . $item{ fieldName } . '\' with + value \'' . $item{ fieldValue } . "\'\n"; print Data::Dumper->Dump([$text], ["fieldStuff"]); } metaField : /^\#/ metaName metaFieldValue { print "[*] Got metafield named \'" . $item{ metaName } . '\' w +ith value \'' . $item{ metaFieldValue } . "\'\n"; } recordBody : field(s) { print "\n[*] field(s)\n"; #print main::Dumper \@item; print Data::Dumper->Dump([$text], ["field(s)"]); } | metaField(s) { print "\n[*] metaField(s)\n"; #print main::Dumper \@item; print Data::Dumper->Dump([$text], ["metaField(s)"]); } | <error> #<error: I am confused in recordBody at $thisoffset!> #startOfRecord: RECORDSTART recordBody(s /$/) RECORDEND startOfRecord: RECORDSTART recordBody RECORDEND { $return = $item[1] +} | #<error> <error: I could not even parse a line line starting at $thiso +ffset!> _EOGRAMMAR_ #$skeletonPattern = "#input_type[ \t]*"; #my $metaFieldPattern = qr/[ \t]*#([^ \t]+)[ \t]+(.*)/o; # "#input_typ +e SCDR+", "#filename processed_01_20080616001403.cdr", etc #my $normalFieldPattern = qr/([ \t]*)([0-9]*)F[ \t]+([^ \t]+)[ \t]+([^ + \t\r\n]+)(.*)/; # "1F S_Diagnostic1 62" OR " F S_Diagnostic1 62" +OR " F S_Diagnostic1 62" are synonymous, etc my $testData = <<'_EOGTESTA_'; RECORD #input_id 91210758171x001_0013 #input_type PTC #output_type MTC #source_id 01 #filename TTFILE01-0001-20080101000000 #jingalama valuewith#inIt andaSpace F ptc_record_length 00B6 F ptc_charging_start_time 20090604093721 F ptc_charging_end_time 20080604093721 F ptc_called_msrn_ton FF F ptc_term_mcz_duration 060000 . _EOGTESTA_ my $testData1 = <<'_EOGTESTA_'; RECORD #input_id 91210758171x001_0013 #output_id #input_type PTC #output_type PTC #addkey #source_id 01 #filename TTFILE01-0001-20080101000000 F ptc_record_length 00B6 F ptc_record_type F ptc_charging_start_time 20090604093721 F ptc_charging_end_time 20080604093721 F ptc_called_msrn_ton FF F ptc_term_mcz_duration 060000 F ptc_term_mcz_change_direction . _EOGTESTA_ print $testData, "\n\n"; #<STDIN>; my $parser = Parse::RecDescent->new($grammar); $parser->startOfRecord($testData) or die "Bad input!\n";

        Output is:

        RECORD
        #input_id 91210758171x001_0013
        #input_type PTC
        #output_type MTC
        #source_id 01
        #filename TTFILE01-0001-20080101000000
        #jingalama valuewith#inIt andaSpace
        F ptc_record_length 00B6
        F ptc_charging_start_time 20090604093721
        F ptc_charging_end_time 20080604093721
        F ptc_called_msrn_ton FF
        F ptc_term_mcz_duration 060000
        .
        
        * RECORDSTART -> RECORD
        * Got metafield named 'input_id' with value '91210758171x001_0013'
        * Got metafield named 'input_type' with value 'PTC'
        * Got metafield named 'output_type' with value 'MTC'
        * Got metafield named 'source_id' with value '01'
        * Got metafield named 'filename' with value 'TTFILE01-0001-20080101000000'
        * Got metafield named 'jingalama' with value 'valuewith#inIt andaSpace'
        
        * metaField(s)
        $metaField(s) = 'F ptc_record_length 00B6
        F ptc_charging_start_time 20090604093721
        F ptc_charging_end_time 20080604093721
        F ptc_called_msrn_ton FF
        F ptc_term_mcz_duration 060000
        .
        ';
        
        

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://754731]
Approved by ELISHEVA
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others cooling their heels in the Monastery: (8)
As of 2019-10-18 04:58 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Notices?