How to parse generic reports?

justanyone has asked for the wisdom of the Perl Monks concerning the following question:

On my project, I have to parse many corporate reports (>50 of them) and put the extracted data into a database.

What is the easiest way to parse reports? I'm hoping someone has already come up with a generic report parsing engine, where you specify the layout somehow along with how to put data in a database.

Has anyone solved the general problem already? Is there such an animal or are we inventing something entirely new? People have created computer reports since 1965, you'd think that someone would have invented a parsing engine already. Any Hints? Any ideas? I'm open to anything that will cut down our workload.

Originally posted as a Categorized Question.

Comment on How to parse generic reports?

Replies are listed 'Best First'.
Re: How to parse generic reports? by erikharrison (Deacon) on Jul 23, 2002 at 18:36 UTC
What you're looking for is a parser generator. You specify a template (called a "grammar") to the parser generator and it spits out Perl code to parse texts conforming to that template. Two well-known parser generators in the Perl world are Parse::Yapp and Parse::RecDescent.	[reply]
Re: How to parse generic reports? by Anonymous Monk on Aug 01, 2002 at 23:32 UTC
My solution: define each type of report line as an unpack TEMPLATE. (You could use my little piece of code to help you with this -- Fixed length file layout - cut2fmt 2). Once you've got your templates, use regex's to identify the line type, then unpack to get the fields.	[reply]
Re: How to parse generic reports? by osfameron (Hermit) on Aug 14, 2002 at 00:27 UTC
From the terminology used ("generic report parser"), I'm guessing that Parse::RecDescent would be overkill / or too complex for the target audience? I've just posted a proof of concept: Parse::Report - parse Perl format-ed reports.	[reply]
Re: Parse generic reports by Anonymous Monk on Jul 24, 2003 at 19:17 UTC
i have been using a generic report parsing tool - mttex - for a similar project. u can get it from http://www.medullatek.com/mttex.htm hope this helps -Anshul Originally posted as a Categorized Answer.	[reply]
Re: Parse generic reports by herveus (Prior) on Jul 23, 2002 at 18:24 UTC
Howdy! I don't think a general solution is either available or practical (or desirable). Are these reports in a consistent format? Are they text? Postscript? RTF? MS Word? Are they structured in any way? Free-form? yours, Michael Originally posted as a Categorized Answer.	[reply]
Re: Parse generic reports by tponnier (Initiate) on Aug 21, 2012 at 05:28 UTC
I think IntelliGet (MountOne Technologies) is the most suitable tool to parse reports. I have been using it for last 2 years to extract data from human readable reports I receive from my vendors. The tool is very generic and you can define your templates and transformations pretty quickly. They have a good support as well Originally posted as a Categorized Answer.	[reply]

Back to Seekers of Perl Wisdom