Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

Re^2: Block-structured language parsing using a Perl module?

by BrowserUk (Pope)
on Aug 15, 2012 at 04:07 UTC ( #987496=note: print w/ replies, xml ) Need Help??


in reply to Re: Block-structured language parsing using a Perl module?
in thread Block-structured language parsing using a Perl module?

The problem with those is they are

  • either: hand-crafted parsers constructed to parse a specific language (HTML).

    These are no use because I'm looking for a parser constructor module.

  • or: examples, of using the parser constructor module to construct a parser to parse some more or less complicated language, written by the author of the module that does the construction.

    It is unsurprising that the author of a given module is motivated enough, and reasonably adept at using his own module, to persist in getting something moderately complicated to work.

    But can anyone else?

If I could find an example of a parser module being used a) in a real-world project; b) of reasonable complexity; c) by some one other than its author; it would give some level of confidence that the module stands up to a) being learned; b) being debugged; c) being maintained in a timely fashion when bugs discovered through real-world usage are reported.

Of the 3 modules I've experimented with, they:

  • had awful apis -- large, complicated, verbose -- with lousy documentation, often as not couched in so much academic/theoretical terminology as to be almost unintelligible.

    I want to use a parser; not learn about the theory behind them.

  • gave almost useless error diagnostics when defining the grammar; and even worse diagnostics when given non-complaint source to parse.
  • so ridiculously slow in operation that the are almost useless for real-world usage.
  • produce parse tree so complicated you need to write another parser to process them.

A can see I am going to end up writing my own; but given the richness of the modules on cpan, I hoped that there was one amongst them that might stand up to RW usage.


With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

The start of some sanity?


Comment on Re^2: Block-structured language parsing using a Perl module?
Re^3: Block-structured language parsing using a Perl module?
by Anonymous Monk on Aug 15, 2012 at 12:38 UTC

    :) I know this probably doesn't qualify also (and you probably saw it) , but GraphViz2::Marpa is not by Marpa author :) though it is also accompanied by how-to article

    FWIW, Marpa guy does give some praise for his error diagnostics on his blog :)

      Thank you. Particularly for answering my actual question :)

      After a more-than-cursory, less-than-comprehensive assessment, Marpa seems like the real deal technically. Fully capable of parsing anything I'm likely to need, and do so efficiently, and give good information when things go wrong.

      But... Why, oh why, do technically very competent programmers -- as the author obviously is -- write such Dog-awful interfaces? And such piss-poor documentation?


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

      The start of some sanity?

Re^3: Block-structured language parsing using a Perl module?
by chromatic (Archbishop) on Aug 16, 2012 at 23:03 UTC
    I want to use a parser; not learn about the theory behind them.

    Part of the problem of lousy documentation is that you have to know enough theory to know both what type of parser you can use on a grammar and if your grammar is even parsable. Because semi-structured text can vary so much in the structure and meaning, the best any general purpose grammar engine can do is push back on you a little bit to figure out whether your language is a regular language, whether you need lookahead and how much, and how you handle things like recursion, if at all.

    Also a lot of the theoretical work comes from the world of linguistics, which is messy on its own.

    I agree about lousy APIs though.

    I can't speak about the performance of Regexp::Grammars, but if I were doing something like this, I'd start there for ease of use. I'd use Marpa for speed and completeness.

      you have to know enough theory to know both what type of parser you can use on a grammar and if your grammar is even parsable.

      Hm. That would be a justification for it, but so far, none of the modules discussion even begins to allow you to answer those types of questions.

      I can't speak about the performance of Regexp::Grammars, but if I were doing something like this, I'd start there for ease of use.

      Hm. I'm going through the docs for Regexp::Grammars now, and trying to do so with an open mind, but honestly, what I'm reading is making my skin crawl.

      The questions I am asking myself at this point are:

      • Why do I need to add/remove directives to my grammar in order to enable/disable debugging?

        For the same reason that these errors exist. Because the module trades a conventional interface for overloading qr//.

        The POD devotes a section to restricting the scope of the module's effect, by wrapping it use in a do block to prevent it from messing with other qr//s in the program. A can see no advantages to the cutesy interface over use Regexp::Grammars qw[ compile ]; my $re = compile $grammar;; and at least 3 disadvantages.

      • Why complicate the interface by adding a logfile option?

        Just send errors/debug/and trace to STDERR and let me redirect -- manually or programmically wherever I want it.

      • Dumping 3 gigatonnes of trace to a file does not constitute a debugger and is no substitute for decent diagnostics.
      • Why are 3 aliases for everything; and 3 ways to do everything including creating aliases?
      • Why do I need both a subroutine-based interface and an OO interface?

        What benefit is there to an OO interface?

        Does it avoid the use of globals like %MATCH and %/ etc? No.

        Can I run two instances of a grammar concurrently? No. The regex engine wouldn't allow it.

        It's just pseud-OO.

      I'd use Marpa for speed and completeness.

      The trouble with Marpa is that it only does half the job. You have to tokenise the source text yourself, and then feed it to the parser in labeled chunks.

      By the time you've written the code to tokenise the input, and then recognise the tokens so you can label them for the "parser", one wonders what parsing there is left for the "parser" to do.

      Its like buying a dog and barking yourself.


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

      The start of some sanity?

        By the time you've written the code to tokenise the input, and then recognise the tokens so you can label them for the "parser", one wonders what parsing there is left for the "parser" to do.

        Apparently that is "lexing", and parsing is making sure the tokens are in the allowed order

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://987496]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others avoiding work at the Monastery: (12)
As of 2014-08-21 18:02 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The best computer themed movie is:











    Results (141 votes), past polls