Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

Re^3: Block-structured language parsing using a Perl module?

by chromatic (Archbishop)
on Aug 16, 2012 at 23:03 UTC ( #987888=note: print w/ replies, xml ) Need Help??


in reply to Re^2: Block-structured language parsing using a Perl module?
in thread Block-structured language parsing using a Perl module?

I want to use a parser; not learn about the theory behind them.

Part of the problem of lousy documentation is that you have to know enough theory to know both what type of parser you can use on a grammar and if your grammar is even parsable. Because semi-structured text can vary so much in the structure and meaning, the best any general purpose grammar engine can do is push back on you a little bit to figure out whether your language is a regular language, whether you need lookahead and how much, and how you handle things like recursion, if at all.

Also a lot of the theoretical work comes from the world of linguistics, which is messy on its own.

I agree about lousy APIs though.

I can't speak about the performance of Regexp::Grammars, but if I were doing something like this, I'd start there for ease of use. I'd use Marpa for speed and completeness.


Comment on Re^3: Block-structured language parsing using a Perl module?
Re^4: Block-structured language parsing using a Perl module?
by BrowserUk (Pope) on Aug 17, 2012 at 07:22 UTC
    you have to know enough theory to know both what type of parser you can use on a grammar and if your grammar is even parsable.

    Hm. That would be a justification for it, but so far, none of the modules discussion even begins to allow you to answer those types of questions.

    I can't speak about the performance of Regexp::Grammars, but if I were doing something like this, I'd start there for ease of use.

    Hm. I'm going through the docs for Regexp::Grammars now, and trying to do so with an open mind, but honestly, what I'm reading is making my skin crawl.

    The questions I am asking myself at this point are:

    • Why do I need to add/remove directives to my grammar in order to enable/disable debugging?

      For the same reason that these errors exist. Because the module trades a conventional interface for overloading qr//.

      The POD devotes a section to restricting the scope of the module's effect, by wrapping it use in a do block to prevent it from messing with other qr//s in the program. A can see no advantages to the cutesy interface over use Regexp::Grammars qw[ compile ]; my $re = compile $grammar;; and at least 3 disadvantages.

    • Why complicate the interface by adding a logfile option?

      Just send errors/debug/and trace to STDERR and let me redirect -- manually or programmically wherever I want it.

    • Dumping 3 gigatonnes of trace to a file does not constitute a debugger and is no substitute for decent diagnostics.
    • Why are 3 aliases for everything; and 3 ways to do everything including creating aliases?
    • Why do I need both a subroutine-based interface and an OO interface?

      What benefit is there to an OO interface?

      Does it avoid the use of globals like %MATCH and %/ etc? No.

      Can I run two instances of a grammar concurrently? No. The regex engine wouldn't allow it.

      It's just pseud-OO.

    I'd use Marpa for speed and completeness.

    The trouble with Marpa is that it only does half the job. You have to tokenise the source text yourself, and then feed it to the parser in labeled chunks.

    By the time you've written the code to tokenise the input, and then recognise the tokens so you can label them for the "parser", one wonders what parsing there is left for the "parser" to do.

    Its like buying a dog and barking yourself.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

    The start of some sanity?

      By the time you've written the code to tokenise the input, and then recognise the tokens so you can label them for the "parser", one wonders what parsing there is left for the "parser" to do.

      Apparently that is "lexing", and parsing is making sure the tokens are in the allowed order

        Apparently that is "lexing", and parsing is making sure the tokens are in the allowed order

        Yes. I am aware of that academical fine distinction. It is all fine and dandy in a nice, theoretical world of white-space delimited, single character tokens, but it doesn't cut it in the real world as far as I'm concerned.

        In many -- arguably, even 'most' -- cases, it in not just a hell of a lot easier to work out where the next token ends if you know what (alternatives) you are expecting, it can be impossible to do so without said information.

        And that means that the hand-written "lexer" you need to write in order to use a Marpa parser, has to effectively replicate the state machine that Marpa constructs.

        At which point, what purpose does the parser serve?


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

        The start of some sanity?

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://987888]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others lurking in the Monastery: (5)
As of 2014-08-23 19:35 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The best computer themed movie is:











    Results (178 votes), past polls