http://www.perlmonks.org?node_id=674305


in reply to Reversible parsing (with Parse::RecDescent?)

I haven't thought of this topic much before you brought it up and I haven't researched it, so I'm making it up as I go along. However, I do have experience with parsing and with P::RD specifically, so hopefully you'll find it useful.


  • Is it necessarily possible to reverse the data through the grammar to get the command line?
  • or is this only possible for special grammars?
  • Is there something I need to in my grammar to make this possible?

Yes, it's possible to deparse a parse tree, if the parse tree contains sufficient information. That means that your parser must return sufficient information to regenerate an acceptably similar source. (e.g. If your language accepts single-quoted and double-quoted string literals, you may not care which one is used in the deparsed text as long as the resulting text is equivalent to the original. You probably don't care about whitespace and comments either.) With P::RD, that is controlled by the grammar.

  • Is there a module that already reverses the parsing of Parse::RecDescent parsers?
  • Is there another module that makes this bi-directionality easier (I'd prefer not to learn another grammar specification syntax, but could)?
  • Is there a module that already does all this soup-to-nuts?

I'm afraid the grammar couldn't be the same as a the one used by P::RD. Actions ({ ... }) and many directives (<...>) cannot be automatically reversed. Error checks would be different. etc.

Now, something like BNF could be automatically reversed, so you could write a parser and deparser that accepted pure BNF. The problems with this approach is that the product returned by the parser would be of limited use. It would require further processing. This is what the P::RD lumps in with the grammar, and that's what's not automatically reversible.


Next, let's address some issues with the grammar.

Bugs:

Sketchy Code:

Stylistic:

# build_parser.pl use strict; use warnings; use Parse::RecDescent qw( ); my $grammar = <<'__END_OF_GRAMMAR__'; { # Lexical pragmas and lexcial variables # here are in the same scope as the sub # created for each grammar rule. use strict; use warnings; my %KNOWN_COMMANDS = map { $_ => 1 } qw( hap hccmrg hup hspp hpp hpg hdp hdlp hcp ); } parse : cmdline /\Z/ { $item[1] } cmdline : pkgcmd opt(s?) { [ @item[1,2] ] } pkgcmd : IDENT pkgcmd_[ $item[1] ] { $item[1] } pkgcmd_ : { $KNOWN_COMMANDS{$arg[0]} } | <error:Unknown command> opt : FLAG opt_ { [ @item[1,2] ] } opt_ : IDENT | { 1 } FLAG : /-\w+/ IDENT : /\w+/ __END_OF_GRAMMAR__ Parse::RecDescent->Precompile($grammar, 'Parser') or die("Bad Grammar\n");

In this post, I show how to deparse the tree created by the above grammar. For more complex trees, it can be very useful to have each non-token rule returned a bless object. Then some method in the class would be responsible for deparsing itself instead of relying on Deparse to do it. I'll get to that in a separate post.

A quick note before contining: You'll notice the parse tree doesn't differentiate between "-opt" and "-opt 1", so you'll get "-opt 1" when you deparse it. This may be acceptable, where it would be a case of being "acceptably similar" as mentioned above. If it's not acceptable, then it's a case of the parse tree not holding enough information as is.

Basically, I created a function for each rule. The function resemble the associated rule, but deparses it instead. Note that your grammar is very trivial (has no choices in it really), which is why you didn't need to use $item[0] (or some other constants) anywhere, and you don't have to make any choices when deparsing.

# Deparse.pm use strict; use warnings; package Deparser; my $skip = ' '; sub add_skip { if (length($_[0])) { return $skip . $_[0]; } else { return ''; } } sub loop(&@) { my $cb = shift(@_); return join $skip, map { $cb->() } @_; } sub deparse { &cmdline } sub cmdline { my ($node) = @_; my ($pkgcmd, $opts) = @$node; my $text = pkgcmd($pkgcmd); $text .= add_skip(loop { opt($_) } @$opts); return $text; } sub pkgcmd { my ($node) = @_; return IDENT($node); } sub opt { my ($node) = @_; my ($flag, $val) = @$node; my $text = FLAG($flag); $text .= add_skip(IDENT($val)); return $text; } sub FLAG { return $_[0] }; sub IDENT { return $_[0] }; 1;

As you can see, it's not too hard. Just make sure to reflect changes to your parser with changes to your deparser.

Now let's use the code.

# test.pl use strict; use warnings; use Data::Dumper qw( Dumper ); use Parser qw( ); use Deparser qw( ); { my $parser = Parser->new(); while (<DATA>) { chomp; print("Unparsed: $_\n"); my $tree = $parser->parse($_) or do { print("Bad data: $_\n"); next; }; my $dumper = Data::Dumper->new([ $tree ]); $dumper->Useqq(1); $dumper->Terse(1); $dumper->Indent(0); print("Parsed: ", $dumper->Dump(), "\n"); my $deparsed = Deparser::deparse($tree); print("Deparsed: $deparsed\n"); } continue { print("\n"); } } __DATA__ hap -b hap -b sknxharvest01 hap -b sknxharvest01 -enc testfile.dfo hap -b sknxharvest01 -usr cgowing -pass chaspass hap -badflag hap -prompt hap -b sknxharvest01 -prompt hap -prompt -b sknxharvest01

Results:

>perl build_parser.pl >perl test.pl Unparsed: hap -b Parsed: ["hap",[["-b",1]]] Deparsed: hap -b 1 Unparsed: hap -b sknxharvest01 Parsed: ["hap",[["-b","sknxharvest01"]]] Deparsed: hap -b sknxharvest01 Unparsed: hap -b sknxharvest01 -enc testfile.dfo Bad data: hap -b sknxharvest01 -enc testfile.dfo Unparsed: hap -b sknxharvest01 -usr cgowing -pass chaspass Parsed: ["hap",[["-b","sknxharvest01"],["-usr","cgowing"],["-pass"," +chaspass"]]] Deparsed: hap -b sknxharvest01 -usr cgowing -pass chaspass Unparsed: hap -badflag Parsed: ["hap",[["-badflag",1]]] Deparsed: hap -badflag 1 Unparsed: hap -prompt Parsed: ["hap",[["-prompt",1]]] Deparsed: hap -prompt 1 Unparsed: hap -b sknxharvest01 -prompt Parsed: ["hap",[["-b","sknxharvest01"],["-prompt",1]]] Deparsed: hap -b sknxharvest01 -prompt 1 Unparsed: hap -prompt -b sknxharvest01 Parsed: ["hap",[["-prompt",1],["-b","sknxharvest01"]]] Deparsed: hap -prompt 1 -b sknxharvest01

Time for a break! Blessed version later.