Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

Re: Perl: Extracting specific text from a .txt file and outputting into a new format

by SimonClinch (Chaplain)
on Nov 18, 2010 at 15:57 UTC ( #872287=note: print w/ replies, xml ) Need Help??


in reply to Perl: Extracting specific text from a .txt file and outputting into a new format

I frequently have to parse all kinds of output and every case is different, but I tend to go through the following process, each step makes the next one trivial, once you get the hang of it:

1) identify the lexical structure of the material -- can it be multiline, does indentation matter, etc.?

2) create a simple lexical analyser out of a hash of regexes and token names.

3) create a thrower or two that ejects white space and/or empty lines, comments etc.

4) create a trivial parser that calls the trivial lexer and thrower and has a subroutine to manage each type of opening landmark (encounter with an identifying string), typically loading it into a suitable structure or printing directly at the end of the section (via closing landmark)

5) if not printing as we go, traverse and print the structure

Update: code example of a lexer

package logparse; sub new { return bless { LEX => { '\w+' => 'TOK_ID', '^[:punct:]+' => 'TOK_PUNCT', # and so on for all character classes you +identify }}; } sub lex { my $self = shift; my $fh = $self -> { FH }; $self -> { BUFFER } ||= <$fh> or goto EOF; PAT: while ( my ($pat, $tok) = each %{ $self -> { LEX }} ) { $/^($pat)(.*)$/ or next PAT; $self -> { BUFFER } = $2; $self -> { LEXVAL } = $1; return $tok; } $self -> { LEXVAL } = substr( $self -> { BUFFER }, 0, 1 ); $self -> { BUFFER } =~ s/^.//; warn "unhandled content at $fh line $.\n"; return ''; EOF: $self -> { LEXVAL } = ''; return 'TOK_EOF'; }
1;

One world, one people


Comment on Re: Perl: Extracting specific text from a .txt file and outputting into a new format
Download Code

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://872287]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (10)
As of 2014-08-22 13:24 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The best computer themed movie is:











    Results (157 votes), past polls