YACC rules to regex rules ? (UPDATED)

LanX has asked for the wisdom of the Perl Monks concerning the following question:

I looked in App::a2p trying to understand how awk is translated to Perl

... but it's written in C, which is

a far from being my best language and
unfortunate from a maintenance perspective.

As far as I can see are big parts generated parser code via YACC (and lex ?)and these seem to have been defined by the following grammar rules:

From https://metacpan.org/source/LEONT/App-a2p-1.013/a2p.c

const char *yyrule[] = {
"$accept : program",
"program : junk hunks",
"begin : BEGIN '{' maybe states '}' junk",
"end : END '{' maybe states '}'",
"end : end NEWLINE",
"hunks : hunks hunk junk",
"hunks :",
"hunk : patpat",
"hunk : patpat '{' maybe states '}'",
"hunk : FUNCTION USERFUN '(' arg_list ')' maybe '{' maybe states '}'",
"hunk : '{' maybe states '}'",
"hunk : begin",
"hunk : end",
"arg_list : expr_list",
"patpat : cond",
"patpat : cond ',' cond",
"cond : expr",
"cond : match",
"cond : rel",
"cond : compound_cond",
"cond : cond '?' expr ':' expr",
"compound_cond : '(' compound_cond ')'",
"compound_cond : cond ANDAND maybe cond",
"compound_cond : cond OROR maybe cond",
"compound_cond : NOT cond",
"rel : expr RELOP expr",
"rel : expr '>' expr",
"rel : expr '<' expr",
"rel : '(' rel ')'",
"match : expr MATCHOP expr",
"match : expr MATCHOP REGEX",
"match : REGEX",
"match : '(' match ')'",
"expr : term",
"expr : expr term",
"expr : expr '?' expr ':' expr",
"expr : variable ASGNOP cond",
"sprintf : SPRINTF_NEW",
"sprintf : SPRINTF_OLD",
"term : variable",
"term : NUMBER",
"term : STRING",
"term : term '+' term",
"term : term '-' term",
"term : term '*' term",
"term : term '/' term",
"term : term '%' term",
"term : term '^' term",
"term : term IN VAR",
"term : variable INCR",
"term : variable DECR",
"term : INCR variable",
"term : DECR variable",
"term : '-' term",
"term : '+' term",
"term : '(' cond ')'",
"term : GETLINE",
"term : GETLINE variable",
"term : GETLINE '<' expr",
"term : GETLINE variable '<' expr",
"term : term 'p' GETLINE",
"term : term 'p' GETLINE variable",
"term : FUN1",
"term : FUN1 '(' ')'",
"term : FUN1 '(' expr ')'",
"term : FUNN '(' expr_list ')'",
"term : USERFUN '(' expr_list ')'",
"term : SPRINTF_NEW '(' expr_list ')'",
"term : sprintf expr_list",
"term : SUBSTR '(' expr ',' expr ',' expr ')'",
"term : SUBSTR '(' expr ',' expr ')'",
"term : SPLIT '(' expr ',' VAR ',' expr ')'",
"term : SPLIT '(' expr ',' VAR ',' REGEX ')'",
"term : SPLIT '(' expr ',' VAR ')'",
"term : INDEX '(' expr ',' expr ')'",
"term : MATCH '(' expr ',' REGEX ')'",
"term : MATCH '(' expr ',' expr ')'",
"term : SUB '(' expr ',' expr ')'",
"term : SUB '(' REGEX ',' expr ')'",
"term : GSUB '(' expr ',' expr ')'",
"term : GSUB '(' REGEX ',' expr ')'",
"term : SUB '(' expr ',' expr ',' expr ')'",
"term : SUB '(' REGEX ',' expr ',' expr ')'",
"term : GSUB '(' expr ',' expr ',' expr ')'",
"term : GSUB '(' REGEX ',' expr ',' expr ')'",
"variable : VAR",
"variable : VAR '[' expr_list ']'",
"variable : FIELD",
"variable : SVFIELD",
"variable : VFIELD term",
"expr_list : expr",
"expr_list : clist",
"expr_list :",
"clist : expr ',' maybe expr",
"clist : clist ',' maybe expr",
"clist : '(' clist ')'",
"junk : junk hunksep",
"junk :",
"hunksep : ';'",
"hunksep : SEMINEW",
"hunksep : NEWLINE",
"hunksep : COMMENT",
"maybe : maybe nlstuff",
"maybe :",
"nlstuff : NEWLINE",
"nlstuff : COMMENT",
"separator : ';' maybe",
"separator : SEMINEW maybe",
"separator : NEWLINE maybe",
"separator : COMMENT maybe",
"states : states statement",
"states :",
"statement : simple separator maybe",
"statement : ';' maybe",
"statement : SEMINEW maybe",
"statement : compound",
"simpnull : simple",
"simpnull :",
"simple : expr",
"simple : PRINT expr_list redir expr",
"simple : PRINT expr_list",
"simple : PRINTF expr_list redir expr",
"simple : PRINTF expr_list",
"simple : BREAK",
"simple : NEXT",
"simple : EXIT",
"simple : EXIT expr",
"simple : CONTINUE",
"simple : RET",
"simple : RET expr",
"simple : DELETE VAR '[' expr_list ']'",
"redir : '>'",
"redir : GRGR",
"redir : '|'",
"compound : IF '(' cond ')' maybe statement",
"compound : IF '(' cond ')' maybe statement ELSE maybe statement",
"compound : WHILE '(' cond ')' maybe statement",
"compound : DO maybe statement WHILE '(' cond ')'",
"compound : FOR '(' simpnull ';' cond ';' simpnull ')' maybe statement
+",
"compound : FOR '(' simpnull ';' ';' simpnull ')' maybe statement",
"compound : FOR '(' expr ')' maybe statement",
"compound : '{' maybe states '}' maybe",
};
[download]

Question: Is there an obvious way to translate this to Perl regexes?

A result could be a parse tree, inspected by a walker to generate Perl code.

Cheers Rolf
_{(addicted to the Perl Programming Language :)

Wikisyntax for the Monastery}

UPDATE

I just realized that the real YACC/LEX rules are only available on GitHub

https://github.com/Leont/app-a2p/blob/master/a2p.y

Leon pruned this file from the CPAN version.

This makes obviously more sense than decipering the generated C-files.

Comment on YACC rules to regex rules ? (UPDATED) Download Code

Replies are listed 'Best First'.
Re: YACC rules to regex rules ? by perlfan (Vicar) on Aug 31, 2020 at 18:17 UTC
This may not help and is not what you asked for, but may lead you to a helpful path nonetheless. thread on converting YACC to ANTLR. The link of interest is 404'ing, but thankfully the Internet never forgets (most of the time). The link there provides a zip. So it's also Java I think, so sorry.	[reply]
Re^2: YACC rules to regex rules ? by LanX (Saint) on Aug 31, 2020 at 22:02 UTC
Thanks. I think I can solve it by myself, the format in a2p.y (which I had to google on github) looks a lot like perlretut#Defining-named-patterns plus embedded Perl code. I'm just asking here before I reinvent the wheel. Cheers Rolf _{(addicted to the Perl Programming Language :) Wikisyntax for the Monastery}	[reply]
Re^3: YACC rules to regex rules ? by LanX (Saint) on Sep 02, 2020 at 13:01 UTC
> perlretut#Defining-named-patterns plus embedded Perl code Unfortunately it's not obvious how to implement precedence and associativity with named patterns. This requires at least one lookahead for an operator. Reimplementing the C code from Yacc and Lex would be quite slow. I looked at CPAN for efficient recursive parsers allowing "precedence" but not much luck. Marpa::R2 requires an XS module Parse::RecDescent doesn't mention precedence Regexp::Grammars neither I'm giving up here. While I'm sure it's possible to translate YACC rules to efficient regular expressions, it would be quite time consuming. Parser generators are not trivial. Update FWIW I found some good threads on the topic, but it'd be cool to transform YACC rules to efficient regexes, because we could easily adapt a parser to language changes. NB: There are multiple versions of AWK available. Some interesting threads Parsing with Regexes and Beyond Re: Use cases for 'sub Pckg::func { }' ? Cheers Rolf _{(addicted to the Perl Programming Language :) Wikisyntax for the Monastery}	[reply]
Re^4: YACC rules to regex rules ? by tybalt89 (Monsignor) on Sep 02, 2020 at 13:24 UTC
Re^4: YACC rules to regex rules ? by Anonymous Monk on Sep 04, 2020 at 03:03 UTC
A reply falls below the community's threshold of quality. You may see it by logging in.


Perl-Sensitive Sunglasses
	PerlMonks

YACC rules to regex rules ? (UPDATED)

UPDATE

Update