Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery
 
PerlMonks  

Repository of Parse::RecDescent grammars?

by Maestro_007 (Hermit)
on Aug 07, 2002 at 14:50 UTC ( #188336=perlmeditation: print w/ replies, xml ) Need Help??

Recently I got an assignment to parse a few files in a format called BAI (Bank Administration Institute), convert them to spreadsheets, and construct elaborate formulas against them.

CPAN to the rescue, of course! Spreadsheet::WriteExcel, Parse::RecDescent, Data::Dumper, and a myriad of other modules made this job a 12-hour job rather than a 100-hour job.

The first task was to learn the BAI format, then write a grammar for it. This is where I spent most of my time. Now that the grammar is good and only needs to be maintained when certain transaction codes change, I want to give this grammar to CPAN so that nobody else will need to go through what I went through.

The problem is that I don't think this needs to be a module, not even a script. All it needs is two text files (one for the grammar and one for the transaction codes), and some perldoc.

Do I need to go through writing a module? I don't mind doing it, it just seems unnecessary. It seems to me that it code that wraps this grammar will be the same code that wraps just about any other grammar that fits into this category (e.g. something that's been around for years, doesn't look like it will change soon, etc.).

A possible solution? How about if we were to create one module in CPAN that houses separate grammars for the various industry-standard file formats for, well, whatever industry you need?

The grammars should themselves follow some kind of standard (e.g. within one grammar, there can be any number of text files, each of which contributes to either a) rules definitions, or b) support data, thorough documentation should be supplied, etc.)

I've got plenty more ideas on this, but I wanted to throw it out there just so I can work on a Grammars::Standard module instead of a BAI::Format module.

Thoughts?

MM

Comment on Repository of Parse::RecDescent grammars?
Re: Repository of Parse::RecDescent grammars?
by ichimunki (Priest) on Aug 07, 2002 at 15:09 UTC
    You might ask theDamian if he'd be willing to include some standard grammars in the Parse::RecDescent tarball-- in addition to the demos folder, a templates folder might be handy... Even if the grammars aren't in module form, having them as a .txt file would be nice. In fact, having a well-stubbed template is about the best one can hope for, since each application may be very different... even if the basic grammar is essentially the same, the bits of Perl code might vary widely.

    Either way, I know I would love to have a prebuilt generic SQL grammar right about now. I found one that's set up for yacc (as an example in an ORA book), but it's going to require some work to get it P::RD ready.
      You might ask theDamian if he'd be willing to include some standard grammars in the Parse::RecDescent tarball-- in addition to the demos folder, a templates folder might be handy...

      If theDamian or someone else tells you that this does not seem to be a good idea, you can give him the hint that this kind of work was already done for regular grammars by theDamian (now maintained by Abigail).

      Hanamaki;-)
Re: Repository of Parse::RecDescent grammars?
by Anonymous Monk on Aug 07, 2002 at 15:55 UTC

    My experience is that industry-standard formats aren't. I mean, yes, they do exist, but more often than not, they're a de facto standard because everybody in that industry uses the same software for the same task. Further, I find that Parse::RecDescent is more often useful to parse either a) Extremely complex grammars like the SQL example above, or b) Proprietary formats of things like part numbers, internal file formats, etc.

    If BAI::Format were truly useful, it would certainly be much more useful to write it as a full-blown module (spit out the code that Parse::RecDescent creates before eval'ing as a starting point), than as a grammar file that needs to be parsed before even starting to process the BAI file(s).

    So, overall, while it'd be nice to have, say, SQL available as a grammar, I don't think most people's Parse::RecDescent creations would be either a) useful to anybody else, or b) be able to be released to the public without having your ass sued off.

      You're probably right about the majority of people's everyday grammars. But I'll bet dollars to donuts there are at least a dozen standards out there that could apply. Medical records? Air-traffic data? Building specifications? If there are a dozen, mightn't that warrant some sort of more general wrapper? This would be in cases where the software people already use is a black-box, and they can't do anything with the data basides what is shoved into a database w/ a limited interface. That's exactly the situation the company was in, so they asked me for help.

      Of course, the maintenance of those standards is an issue. If I didn't know that this standard was fairly well accepted and rarely changed I wouldn't propose it as one of them. Clearly there won't be too many that apply. The idea would be to have grammars for the ones that apply, and to make them flexible.

      Sure, I could create BAI::Format, in fact I already have (though a different name, still need to work on what its CPAN name would be). The problem is that it presumes to know what format you want your data returned in. It doesn't want to know the names of the bank account owners, it wants to know dates and amounts. I'll bet I'll never need the data structures again when I re-use this parser for another project, but I'll still need the parser. If I change it so that all the data is extrapolated into a large hash of hashes, I'll have a huge data structure, only about 15% of which I'll actually need.

      The idea is that I supply the rules for how the file should look, and everyone who downloads the parser will only have to do the easy part, i.e. figuring out what to do with the data. Sure, it'll take some work on their part, but I'll just about guarantee that my solution wouldn't work for them. If someone had done this for my project, leaving all the references to $item[x] out of it, it probably would have taken about 3 hours (less|fewer).

      This still may be a bad idea, especially with the good points you bring up. I guess the real answer will rest on whether or not there are enough of these file formats (that aren't proprietary) to justify it.

      Good comments, thanks.

      MM

        I was in the same boat, and just posted my fragments to Snippets to save the next person the effort.

        (If Abigail-II or TheDamian want's to include that in the regex collection mentioned earlier, that's fine)

        —John

•Re: Repository of Parse::RecDescent grammars?
by merlyn (Sage) on Aug 07, 2002 at 21:10 UTC
    At a minimum, you can post it as an example to the P::RD mailing list, which means it will be archived permanently, and available to others in a search. TheDamian might also notice it and put it in the examples in the distro if it's interesting.

    -- Randal L. Schwartz, Perl hacker

Re: Repository of Parse::RecDescent grammars?
by TheDamian (Priest) on Aug 07, 2002 at 22:29 UTC
    I think a repository of RecDescent grammars is an excellent idea. I don't think they should be included in the module distribution itself for fear of excessive bloat. A couple of alternatives:
    1. Someone could volunteer to maintain a Parse::RecDescent::Grammar pseudo-module, that aggregates all the contributed grammars. There could even be a front-end analogous to Regexp::Common.
    2. We could just use Regexp::Common directly, by providing a top-level $RE{grammar} namespace.
    3. We could ask for a RECDESCENT (pseudo)ID on CPAN and use that directory to house RecDescent grammars.
    4. Someone could set up a web-site that allowed people to submit and then update their grammars

    I would support any of these approaches, though, unfortunately, I don't have the spare capacity to implement any of them myself. Personally, I favour the P::RD::G approach, since it puts the grammars on the CPAN, in the correct namespace, where search.cpan.org is likely to draw them to the attention of potential beneficiaries.

      We could ask for a RECDESCENT (pseudo)ID on CPAN
      I understand that the CPAN team will (rightfully) not give out pseudo-IDs. However, nothing stops individual contributors from putting things into the CPAN, and then some editor maintain a README file as a master index pointing to the other contributions. This is similar to how the scripts portion of the archive goes.

      -- Randal L. Schwartz, Perl hacker

Re: Repository of Parse::RecDescent grammars?
by mwest (Acolyte) on Nov 01, 2002 at 02:01 UTC
    I turned up your posting because I am trying to do pretty much the same thing- parse some structured text with Parse::RecDescent and put it into excel with SpreadSheet::WriteExcel.

    I think you would save me hours if I could have a peak at your code- the grammar, and how to get the parsed values out of the parser and into excel. I know WriteExcel, but RecDescent is heavy. Can you post in snippets or email me?

    Thanks.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlmeditation [id://188336]
Front-paged by TStanley
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others examining the Monastery: (6)
As of 2014-07-23 00:48 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (130 votes), past polls