http://www.perlmonks.org?node_id=184173

justanyone has asked for the wisdom of the Perl Monks concerning the following question:

On my project, I have to parse many corporate reports (>50 of them) and put the extracted data into a database.

What is the easiest way to parse reports? I'm hoping someone has already come up with a generic report parsing engine, where you specify the layout somehow along with how to put data in a database.

Has anyone solved the general problem already? Is there such an animal or are we inventing something entirely new? People have created computer reports since 1965, you'd think that someone would have invented a parsing engine already. Any Hints? Any ideas? I'm open to anything that will cut down our workload.

Originally posted as a Categorized Question.

Replies are listed 'Best First'.
Re: Parse generic reports
by Anonymous Monk on Jul 24, 2003 at 19:17 UTC
    i have been using a generic report parsing tool - mttex - for a similar project. u can get it from http://www.medullatek.com/mttex.htm hope this helps -Anshul

    Originally posted as a Categorized Answer.

Re: Parse generic reports
by herveus (Prior) on Jul 23, 2002 at 18:24 UTC
    Howdy!

    I don't think a general solution is either available or practical (or desirable).

    Are these reports in a consistent format? Are they text? Postscript? RTF? MS Word?

    Are they structured in any way? Free-form?

    yours,
    Michael

    Originally posted as a Categorized Answer.

Re: How to parse generic reports?
by erikharrison (Deacon) on Jul 23, 2002 at 18:36 UTC
    What you're looking for is a parser generator. You specify a template (called a "grammar") to the parser generator and it spits out Perl code to parse texts conforming to that template. Two well-known parser generators in the Perl world are Parse::Yapp and Parse::RecDescent.

Re: How to parse generic reports?
by Anonymous Monk on Aug 01, 2002 at 23:32 UTC
    My solution: define each type of report line as an unpack TEMPLATE.
    (You could use my little piece of code to help you with this -- Fixed length file layout - cut2fmt 2).
    Once you've got your templates, use regex's to identify the line type, then unpack to get the fields.
Re: How to parse generic reports?
by osfameron (Hermit) on Aug 14, 2002 at 00:27 UTC
Re: Parse generic reports
by tponnier (Initiate) on Aug 21, 2012 at 05:28 UTC
    I think IntelliGet (MountOne Technologies) is the most suitable tool to parse reports. I have been using it for last 2 years to extract data from human readable reports I receive from my vendors. The tool is very generic and you can define your templates and transformations pretty quickly. They have a good support as well

    Originally posted as a Categorized Answer.