http://www.perlmonks.org?node_id=184173

justanyone has asked for the wisdom of the Perl Monks concerning the following question:

On my project, I have to parse many corporate reports (>50 of them) and put the extracted data into a database.

What is the easiest way to parse reports? I'm hoping someone has already come up with a generic report parsing engine, where you specify the layout somehow along with how to put data in a database.

Has anyone solved the general problem already? Is there such an animal or are we inventing something entirely new? People have created computer reports since 1965, you'd think that someone would have invented a parsing engine already. Any Hints? Any ideas? I'm open to anything that will cut down our workload.

Originally posted as a Categorized Question.

Replies are listed 'Best First'.
Re: How to parse generic reports?
by erikharrison (Deacon) on Jul 23, 2002 at 18:36 UTC
    What you're looking for is a parser generator. You specify a template (called a "grammar") to the parser generator and it spits out Perl code to parse texts conforming to that template. Two well-known parser generators in the Perl world are Parse::Yapp and Parse::RecDescent.

Re: How to parse generic reports?
by Anonymous Monk on Aug 01, 2002 at 23:32 UTC
    My solution: define each type of report line as an unpack TEMPLATE.
    (You could use my little piece of code to help you with this -- Fixed length file layout - cut2fmt 2).
    Once you've got your templates, use regex's to identify the line type, then unpack to get the fields.
Re: How to parse generic reports?
by osfameron (Hermit) on Aug 14, 2002 at 00:27 UTC