Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

YACC rules to regex rules ? (UPDATED)

by LanX (Saint)
on Aug 31, 2020 at 16:32 UTC ( [id://11121228]=perlquestion: print w/replies, xml ) Need Help??

LanX has asked for the wisdom of the Perl Monks concerning the following question:

Hi

I looked in App::a2p trying to understand how awk is translated to Perl

... but it's written in C, which is

  • a far from being my best language and
  • unfortunate from a maintenance perspective.
As far as I can see are big parts generated parser code via YACC (and lex ?)and these seem to have been defined by the following grammar rules:

From https://metacpan.org/source/LEONT/App-a2p-1.013/a2p.c

const char *yyrule[] = { "$accept : program", "program : junk hunks", "begin : BEGIN '{' maybe states '}' junk", "end : END '{' maybe states '}'", "end : end NEWLINE", "hunks : hunks hunk junk", "hunks :", "hunk : patpat", "hunk : patpat '{' maybe states '}'", "hunk : FUNCTION USERFUN '(' arg_list ')' maybe '{' maybe states '}'", "hunk : '{' maybe states '}'", "hunk : begin", "hunk : end", "arg_list : expr_list", "patpat : cond", "patpat : cond ',' cond", "cond : expr", "cond : match", "cond : rel", "cond : compound_cond", "cond : cond '?' expr ':' expr", "compound_cond : '(' compound_cond ')'", "compound_cond : cond ANDAND maybe cond", "compound_cond : cond OROR maybe cond", "compound_cond : NOT cond", "rel : expr RELOP expr", "rel : expr '>' expr", "rel : expr '<' expr", "rel : '(' rel ')'", "match : expr MATCHOP expr", "match : expr MATCHOP REGEX", "match : REGEX", "match : '(' match ')'", "expr : term", "expr : expr term", "expr : expr '?' expr ':' expr", "expr : variable ASGNOP cond", "sprintf : SPRINTF_NEW", "sprintf : SPRINTF_OLD", "term : variable", "term : NUMBER", "term : STRING", "term : term '+' term", "term : term '-' term", "term : term '*' term", "term : term '/' term", "term : term '%' term", "term : term '^' term", "term : term IN VAR", "term : variable INCR", "term : variable DECR", "term : INCR variable", "term : DECR variable", "term : '-' term", "term : '+' term", "term : '(' cond ')'", "term : GETLINE", "term : GETLINE variable", "term : GETLINE '<' expr", "term : GETLINE variable '<' expr", "term : term 'p' GETLINE", "term : term 'p' GETLINE variable", "term : FUN1", "term : FUN1 '(' ')'", "term : FUN1 '(' expr ')'", "term : FUNN '(' expr_list ')'", "term : USERFUN '(' expr_list ')'", "term : SPRINTF_NEW '(' expr_list ')'", "term : sprintf expr_list", "term : SUBSTR '(' expr ',' expr ',' expr ')'", "term : SUBSTR '(' expr ',' expr ')'", "term : SPLIT '(' expr ',' VAR ',' expr ')'", "term : SPLIT '(' expr ',' VAR ',' REGEX ')'", "term : SPLIT '(' expr ',' VAR ')'", "term : INDEX '(' expr ',' expr ')'", "term : MATCH '(' expr ',' REGEX ')'", "term : MATCH '(' expr ',' expr ')'", "term : SUB '(' expr ',' expr ')'", "term : SUB '(' REGEX ',' expr ')'", "term : GSUB '(' expr ',' expr ')'", "term : GSUB '(' REGEX ',' expr ')'", "term : SUB '(' expr ',' expr ',' expr ')'", "term : SUB '(' REGEX ',' expr ',' expr ')'", "term : GSUB '(' expr ',' expr ',' expr ')'", "term : GSUB '(' REGEX ',' expr ',' expr ')'", "variable : VAR", "variable : VAR '[' expr_list ']'", "variable : FIELD", "variable : SVFIELD", "variable : VFIELD term", "expr_list : expr", "expr_list : clist", "expr_list :", "clist : expr ',' maybe expr", "clist : clist ',' maybe expr", "clist : '(' clist ')'", "junk : junk hunksep", "junk :", "hunksep : ';'", "hunksep : SEMINEW", "hunksep : NEWLINE", "hunksep : COMMENT", "maybe : maybe nlstuff", "maybe :", "nlstuff : NEWLINE", "nlstuff : COMMENT", "separator : ';' maybe", "separator : SEMINEW maybe", "separator : NEWLINE maybe", "separator : COMMENT maybe", "states : states statement", "states :", "statement : simple separator maybe", "statement : ';' maybe", "statement : SEMINEW maybe", "statement : compound", "simpnull : simple", "simpnull :", "simple : expr", "simple : PRINT expr_list redir expr", "simple : PRINT expr_list", "simple : PRINTF expr_list redir expr", "simple : PRINTF expr_list", "simple : BREAK", "simple : NEXT", "simple : EXIT", "simple : EXIT expr", "simple : CONTINUE", "simple : RET", "simple : RET expr", "simple : DELETE VAR '[' expr_list ']'", "redir : '>'", "redir : GRGR", "redir : '|'", "compound : IF '(' cond ')' maybe statement", "compound : IF '(' cond ')' maybe statement ELSE maybe statement", "compound : WHILE '(' cond ')' maybe statement", "compound : DO maybe statement WHILE '(' cond ')'", "compound : FOR '(' simpnull ';' cond ';' simpnull ')' maybe statement +", "compound : FOR '(' simpnull ';' ';' simpnull ')' maybe statement", "compound : FOR '(' expr ')' maybe statement", "compound : '{' maybe states '}' maybe", };

Question: Is there an obvious way to translate this to Perl regexes?

A result could be a parse tree, inspected by a walker to generate Perl code.

Cheers Rolf
(addicted to the Perl Programming Language :)
Wikisyntax for the Monastery

UPDATE

I just realized that the real YACC/LEX rules are only available on GitHub

https://github.com/Leont/app-a2p/blob/master/a2p.y

Leon pruned this file from the CPAN version.

This makes obviously more sense than decipering the generated C-files.

Replies are listed 'Best First'.
Re: YACC rules to regex rules ?
by perlfan (Vicar) on Aug 31, 2020 at 18:17 UTC
      Thanks.

      I think I can solve it by myself, the format in a2p.y (which I had to google on github) looks a lot like perlretut#Defining-named-patterns plus embedded Perl code.

      I'm just asking here before I reinvent the wheel.

      Cheers Rolf
      (addicted to the Perl Programming Language :)
      Wikisyntax for the Monastery

        > perlretut#Defining-named-patterns plus embedded Perl code

        Unfortunately it's not obvious how to implement precedence and associativity with named patterns.

        This requires at least one lookahead for an operator.

        Reimplementing the C code from Yacc and Lex would be quite slow.

        I looked at CPAN for efficient recursive parsers allowing "precedence" but not much luck.

        I'm giving up here.

        While I'm sure it's possible to translate YACC rules to efficient regular expressions, it would be quite time consuming.

        Parser generators are not trivial.

        Update

        FWIW I found some good threads on the topic, but it'd be cool to transform YACC rules to efficient regexes, because we could easily adapt a parser to language changes.

        NB: There are multiple versions of AWK available.

        Some interesting threads

        Cheers Rolf
        (addicted to the Perl Programming Language :)
        Wikisyntax for the Monastery

A reply falls below the community's threshold of quality. You may see it by logging in.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://11121228]
Approved by herveus
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others pondering the Monastery: (5)
As of 2024-04-20 00:13 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found