Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

How to parse generic reports?

( #184173=categorized question: print w/ replies, xml ) Need Help??
Contributed by justanyone on Jul 22, 2002 at 18:39 UTC
Q&A  > files


Description:

On my project, I have to parse many corporate reports (>50 of them) and put the extracted data into a database.

What is the easiest way to parse reports? I'm hoping someone has already come up with a generic report parsing engine, where you specify the layout somehow along with how to put data in a database.

Has anyone solved the general problem already? Is there such an animal or are we inventing something entirely new? People have created computer reports since 1965, you'd think that someone would have invented a parsing engine already. Any Hints? Any ideas? I'm open to anything that will cut down our workload.

Answer: How to parse generic reports?
contributed by erikharrison

What you're looking for is a parser generator. You specify a template (called a "grammar") to the parser generator and it spits out Perl code to parse texts conforming to that template. Two well-known parser generators in the Perl world are Parse::Yapp and Parse::RecDescent.

Answer: How to parse generic reports?
contributed by Anonymous Monk

My solution: define each type of report line as an unpack TEMPLATE.
(You could use my little piece of code to help you with this -- Fixed length file layout - cut2fmt 2).
Once you've got your templates, use regex's to identify the line type, then unpack to get the fields.

Answer: How to parse generic reports?
contributed by osfameron

From the terminology used ("generic report parser"), I'm guessing that Parse::RecDescent would be overkill / or too complex for the target audience?

I've just posted a proof of concept: Parse::Report - parse Perl format-ed reports.

Please (register and) log in if you wish to add an answer



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • Outside of code tags, you may need to use entities for some characters:
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?
    Username:
    Password:

    What's my password?
    Create A New User
    Chatterbox?
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others lurking in the Monastery: (9)
    As of 2014-08-30 17:15 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?

      The best computer themed movie is:











      Results (293 votes), past polls