Problems? Is your data what you think it is?

YACC rules to regex rules ? (UPDATED)

by LanX (Cardinal)
on Aug 31, 2020 at 16:32 UTC

LanX has asked for the wisdom of the Perl Monks concerning the following question:


I looked in App::a2p trying to understand how awk is translated to Perl

... but it's written in C, which is

  • a far from being my best language and
  • unfortunate from a maintenance perspective.
As far as I can see are big parts generated parser code via YACC (and lex ?)and these seem to have been defined by the following grammar rules:


Question: Is there an obvious way to translate this to Perl regexes?

A result could be a parse tree, inspected by a walker to generate Perl code.

Cheers Rolf
(addicted to the Perl Programming Language :)
I just realized that the real YACC/LEX rules are only available on GitHub

Leon pruned this file from the CPAN version.

This makes obviously more sense than decipering the generated C-files.

Re: YACC rules to regex rules ?
by perlfan (Vicar) on Aug 31, 2020 at 18:17 UTC

      I think I can solve it by myself, the format in a2p.y (which I had to google on github) looks a lot like perlretut#Defining-named-patterns plus embedded Perl code.

      I'm just asking here before I reinvent the wheel.

      Cheers Rolf
      (addicted to the Perl Programming Language :)
        > perlretut#Defining-named-patterns plus embedded Perl code

        Unfortunately it's not obvious how to implement precedence and associativity with named patterns.

        This requires at least one lookahead for an operator.

        Reimplementing the C code from Yacc and Lex would be quite slow.

        I looked at CPAN for efficient recursive parsers allowing "precedence" but not much luck.

        I'm giving up here.

        While I'm sure it's possible to translate YACC rules to efficient regular expressions, it would be quite time consuming.

        Parser generators are not trivial.


        FWIW I found some good threads on the topic, but it'd be cool to transform YACC rules to efficient regexes, because we could easily adapt a parser to language changes.

        NB: There are multiple versions of AWK available.

        Some interesting threads

        Cheers Rolf
        (addicted to the Perl Programming Language :)
