Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

Yacc is dead

by casiano (Pilgrim)
on Dec 08, 2010 at 19:26 UTC ( [id://876097]=perlmeditation: print w/replies, xml ) Need Help??

Matthew Might and David Darais paper titled “Yacc is dead.” in http://arxiv.org/abs/1010.5023 is receiving a lot of attention (see the discussions in "Lambda the Ultimate" and Russ Cox (the author of Google's Go language).

The paper starts with a harsh critique of the practice of parsing context free languages with (not-really-)regular expressions in languages like Perl.

It uses the term, apparently introduced by Larry Wall, “cargo cult parsing” to refer to the use of cut and paste imitation and copying “magic” regular expressions.

The paper says that people abuse regular expressions instead of turning to tools like yacc because

“regular expressions are `WYSIWYG'—the language described is the language that gets matched—whereas parser-generators are WYSIWYGIYULR(k)—`what you see is what you get if you understand LR(k).'"

What is your opinion on the subject?

Should we run to implement the derivative parsers described in the paper in Perl?

How does it relate to current Perl parsers?

Replies are listed 'Best First'.
Re: Yacc is dead
by BrowserUk (Patriarch) on Dec 08, 2010 at 22:22 UTC

    Let's see. The choice is between using:

    • a tool that gives you "the language described is the language that gets matched";

      Which sounds like a paraphrase for "does exactly what is required".

    • some other tool that doesn't;

      and is sufficiently complex that at least some, perhaps many, find it difficult to understand and use.

    What benefit is there for the latter?

    That it can only be written and maintained in obscure, compiled languages by people with an MS in theoretical BS perhaps?


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      First. As Russ points out the critique of the practice of parsing context free languages with regular expressions is scathing. I add that is unfair.

      The problem is in the people in the use of cut and paste imitation and copying “magic” regular expressions without fully understanding them. You must understand what you are doing.

      In fact I am not so sure about the goodness of the algorithm proposed in "The Yacc is dead" paper.

      According to Russ analysis, the proposed algorithm - due to backtracking - fails to guarantee the required efficiency. He said:

      'Instead of ending the supposed problem of cargo cult parsing, the paper ends up being a prime example of what Richard Feynman called “cargo cult science” ... in which a successful line of research is imitated but without some key aspect that made the original succeed'

      There is nothing bad in using regular expressions to parse context free languages. I do it. My concern is about the lack of some important parsing algorithms in CPAN, like GLR.

      I am sure also that yacc is not dead (yet). "The asteroid to kill this dinosaur is still in orbit.”

        I admit it's a long time ago I had to parse something as "complex" as a context-free language. But I've done quite a lot of regular expression mojo since.

        Even with 5.10 regular expressions, I would not use regular expressions to parse something. I'd use regular expressions to tokenize the stream, but I'd use some kind of state machine for the parsing. I do consider Parse::RecDescent to implement a state machine, with hooks for callbacks, that uses regular expressions to tokenize. (Of course, regular expressions themselves are state machines...)

        Of course, I may change my mind halfway the project....

Re: Yacc is dead
by moritz (Cardinal) on Dec 09, 2010 at 07:13 UTC
Re: Yacc is dead
by zby (Vicar) on Dec 09, 2010 at 07:40 UTC
    A note about this sentence: "what you see is what you get if you understand LR(k)" (and also about some Steve Yegge popular writings).

    LR parsing is an interesting idea - but it's just a tool. A lot of stuff in programming is stated in a way trying to induce the feeling of 'we understand *** - we are elite, while the unwashed masses still use xxx'. There is nothing wrong with learning new things - but to be honest parsing is not that important in the practice of programming any more. If you need it then it is mostly the case of a widespread language that has already good parsers. If you want to improve those parsers - then yes you need to know the theory - but you can also learn some other theory and improve other libraries or maybe learn something higher level like the stuff that Misko Hevery is blogging about and improve your over-all programming practice.

      If you need it then it is mostly the case of a widespread language that has already good parsers.

      • There are Earley parsers in Perl 5 Parse::Earley and Marpa.
      • There are no GLR parsers in Perl 5, though is one of the most commonly used nowadays (search for GLR and Perl in google)
      • I believe - not really sure - that though there is no explicit Packrat-parsing module, Regexp::Grammars conforms to the Packrat approach
      • As far as I am aware, there is only one CPAN module giving support to attribute grammars: Language::AttributeGrammar
      These are the available choices for the main "Parsing Algorithms" that I know in Perl 5/CPAN. Anything else?
        OK - I agree that writing a parser using these new techniques would be valuable, and you did apparently do a lot of background research and this is all good for the community etc. My note was only a nitpick - don't take it too personally. I only wanted to say that at my work the standard HTML parser (or YAML parser or XML parser or .INI parser) is enough for all I ever needed. Writing a GLR parser or something else would be a great contribution - but so could be solving numerous other problems. My point really is that condescending people for getting things done using whatever available is low and that is what I see in the quoted sentence.
Re: Yacc is dead
by sundialsvc4 (Abbot) on Dec 09, 2010 at 21:15 UTC

    Nice title.   Very catchy.   No usable content.

    Obviously, when you have a parsing job to do, you (first of all...) should use a parser.   (“Regular Expression Hell” is certainly not a place you want to be, “even if it work(ed, once...).”   But is it in any way useful to say that, either with a headline-only title, or with a vague and un-specific blanket slam of a programming language?   Not useful to anyone.

    Certainly, one should be aware of the many powerful parsers that are available in (or by) Perl.   For instance, I recently have been working on what has turned into a very large application-understanding project which uses Parse::RecDescent.   I freely acknowledge that to have tried to do such a thing by writing “Regex Hell” myself would have been silly ... how much better to have a CPAN module take care of doing that for me!   ;-)   The outcome has been perfectly satisfactory, especially given that I am doing a very in-exact parse ... “gleaning” useful information from files (SAS®, DB2®, and Korn Shell scripts... thousands of them...) whose general structure is only approximately predictable.   I have no reason for complaint concerning this excellent (pure Perl) tool.

    (And yes, yacc has always done everything I have ever asked of it, too.)

    Obviously, each parsing job is different, and so each parsing tool is, too.   As with all tool-selection, the challenge is to select the right tool, for this job, at this time.   The only thing that really matters when solving a problem is, how you choose to approach the problem.   Not which language you use (within reason).   Any other assertion is attention-grabbing and puerile...

      Nice title. Very catchy. No usable content.

      Indeed, a catchy title. Talk about the "dead of ..." (Perl for example :-) and you'll immediately get an audience.

      I believe there is still some usable content in the paper, however.

      Yes, the approach is the important thing, but the language you use (within reason) matters: it constraints your approach, the number of tools you can use and the maintenance of your application.

        /me nods...

        Understood.   “Within reason.”   But let the record show that new languages pop up every couple years, each one promising to be the savior, and soon enough there are new mountains of offal being written in every single one of them.   Your fundamental approach to the problem is the most important tool.

Re: Yacc is dead
by Anonymous Monk on Dec 08, 2010 at 20:04 UTC

    Quite frankly, who gives a f***?

    Whilst they are wasting time writing papers and jumping up and down about people not understanding LR(k) parsers, context free grammers, Uncle Tom Cobley and all, Perl programmers are writing code that does the job. And the next. And the next.

Re: Yacc is dead
by JavaFan (Canon) on Dec 08, 2010 at 21:34 UTC
    Should we run to implement the derivative parsers described in the paper in Perl?
    I don't care what you do, but why do you even consider asking me to do it? I guess from the way of phrasing the question is that you see a case for people to implement derivated parsers, but I don't see it, and you don't share your views. All you do is briefly summarize a paper, and then ask whether we should rally.

    Come on man, if you have views strong enough you consider a mobilization of Perl programmers, spit them out. Be vocal. Write a paper. Publish it. Write some more. Get people behind whatever views you have. *THEN* come back to organize an army.

    Personally, I don't give a flying fuck whether someone states that yacc is dead. Or whether that someone has opinions about parsers written in Perl.

      There was a short paper in Inforum 2010 analyzing the current state of parsers in Perl. See Parser Generation in Perl: an Overview and Available Tools by Hugo Areias et al.

      They conclude:

      Parser generators in Perl still lacks valuable mechanisms to make them challengeable when compared with other languages, like C. There is no valid support for attribute grammars and, according to the research made, there is only one module on CPAN that supports attribute grammars that, however, lacks of maintenance for several years now.

      I have tried myself to improve yacc-like conflict resolution mechanism with a new mechanism called "Postoponed Conflict Resolution" (PPCR). You can see how it is used in several Eyapp examples, See the following files in the t/ directory of the Eyapp distribution:
      1. dynamic.eyp
      2. pascalnestedeyapp3
      3. CplusplusNested2.eyp
      The first shows how you can use PPCR to dynamically change the parsing and consequently the abstract syntax trees on the fly.

      The second solves the problem of enumerated types versus range types that appears in Extended Pascal. It has been used in the Bison manual to illustrate the power of GLR.

      The third one solves the well known C++ ambiguity between certain declarations and statements.

      To compile them, install Parse::Eyapp and follow the perldoc instructions in the grammars.

      I still believe it is important for the Perl community to cover some important parsing algorithms. There is no currently in CPAN an implementation of GLR, which is important not only when writing translators for DSLs but also in Natural Language Processing

      Write a paper. Publish it. Write some more. Get people behind whatever views you have. *THEN* come back to organize an army.

      We have already written a paper. An there are three more on the subject waiting for the reviews. Hope they will be accepted.

      But no army yet :-). Just another person.

Re: Yacc is dead
by casiano (Pilgrim) on Dec 29, 2010 at 13:27 UTC
    To complete the discussion in this node, consider reading the article Killing Yacc: 1, 2 & 3 by Jeffrey Kegler, written on December 15, 2010.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlmeditation [id://876097]
Approved by ww
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others learning in the Monastery: (6)
As of 2024-04-23 21:31 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found