Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

In search of grammars for parsing

by coppit (Beadle)
on Aug 06, 2004 at 00:49 UTC ( #380425=perlquestion: print w/replies, xml ) Need Help??
coppit has asked for the wisdom of the Perl Monks concerning the following question:

Hi all,

I'm writing a code generator that needs to parse a YACC-like input file (including partially parsing the C/C++ parts of the head and tail sections), as well as a LEX-like input file.

I figured I would start with existing YACC and LEX grammars, and modify them to add my new features. Unfortunately, I'm having trouble finding suitable Perl tools for parsing YACC, LEX, C, and C++. (I know the latter two are hard--I only need a partial parsing.) Do anyone know where I can find grammars for these languages?

Here's what my research has uncovered:

Parsing a YACC file

  • Parse::Yapp includes YappParse.yp, which partially parses most of a YACC grammar. I'd have to modify it to parse the tokens and union declaration.
  • Parse::RecDescent shows a metagrammar in the POD, but I can't find the real thing in the distribution. I could use Parse::RecDescent::Deparse to recover it, but this isn't probably advisable and I'm not sure how close it is to recognizing a real YACC grammar anyway.
Parsing a LEX file

  • I can't find a suitable grammar anywhere. :(
Parsing C/C++

  • I could try Inline::C::ParseRegExp or Inline::C::ParseRecDescent for C, and Inline::CPP::grammar for C++.
  • Parse::RecDescent has a pretty functional demo_Cgrammar.pl, but the demo_cpp.pl seems a bit limited.
  • PERCEPS (http://starship.python.net/crew/tbryan/PERCEPS/) is a Perl header file parser. Not sure how well it would work for the head section of a YACC or LEX input file.
In the worst case, both the bison and flex distributions have grammars, which I could adapt to suitable input for Parse::RecDescent or Parse::Yapp. But I can't help but think that someone has already done all this work.

I need to fully parse the YACC and LEX input files, but for the C and C++ code I just need to locate any definitions so that I can move them to an implementation file and replace them with "extern" declarations. (I'm generating multiple source files from the YACC/LEX files, instead of the normal one big implementation file.)

Thanks for the help!
David

Replies are listed 'Best First'.
Re: In search of grammars for parsing
by jaldhar (Vicar) on Aug 06, 2004 at 03:14 UTC

    For Yacc at least, how about perl-byacc? It's a modified Berkeley yacc which outputs perl. There's a Debian package fwiw.

    --
    જલધર

Re: In search of grammars for parsing
by kvale (Monsignor) on Aug 06, 2004 at 04:52 UTC
    ParseLex is a lexical parser along the lines of Lex and may be a good starting point. On the other hand, instructions are in French.

    A grammar of flex is in the flex package itself. Check out parse.y in the top directory.

    -Mark

Re: In search of grammars for parsing
by Velaki (Chaplain) on Aug 06, 2004 at 11:30 UTC

    Just a thought, but if worse comes to worse, what about using some of the directed graph modules to build your own parser? As long as you can find a grammar, things should be Jake.

    These notes might point you in the right direction for the appropriate modules. It even mentions Parse::Lex, which should be helpful, I hope.

    -v
    "Perl. There is no substitute."

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://380425]
Approved by davido
Front-paged by Aristotle
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others imbibing at the Monastery: (5)
As of 2018-11-21 05:30 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    My code is most likely broken because:
















    Results (237 votes). Check out past polls.

    Notices?