|Perl: the Markov chain saw|
Re: Perl: Extracting specific text from a .txt file and outputting into a new formatby SimonClinch (Deacon)
|on Nov 18, 2010 at 15:57 UTC||Need Help??|
I frequently have to parse all kinds of output and every case is different, but I tend to go through the following process, each step makes the next one trivial, once you get the hang of it:
1) identify the lexical structure of the material -- can it be multiline, does indentation matter, etc.?
2) create a simple lexical analyser out of a hash of regexes and token names.
3) create a thrower or two that ejects white space and/or empty lines, comments etc.
4) create a trivial parser that calls the trivial lexer and thrower and has a subroutine to manage each type of opening landmark (encounter with an identifying string), typically loading it into a suitable structure or printing directly at the end of the section (via closing landmark)
5) if not printing as we go, traverse and print the structure
Update: code example of a lexer
One world, one people