Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

Parsing a non-formatted record text file

by TStanley (Canon)
on Feb 16, 2009 at 15:29 UTC ( #744094=perlquestion: print w/ replies, xml ) Need Help??
TStanley has asked for the wisdom of the Perl Monks concerning the following question:

My boss has asked me to put together a script to read the results from a series of top commands from our HP-UX system, then put the data into a CSV file. The biggest problem that I see, is that the data from each individual top command runs together, so that there is not a way to actually separate the data into records for easier processing. A small sample of the data is below:
System: hpnclass Thu Feb 12 16:31:02 2009 Load averages: 0.51, 0.45, 0.49 541 processes: 483 sleeping, 58 running Cpu states: (avg) LOAD USER NICE SYS IDLE BLOCK SWAIT INTR SSYS 0.51 14.7% 3.0% 18.3% 63.9% 0.0% 0.0% 0.0% 0.0% Memory: 13706264K (11030248K) real, 18023572K (14848696K) virtual, 216 +916K free Page# 1/109 CPU TTY PID USERNAME PRI NI SIZE RES STATE TIME %WCPU %CPU C +OMMAND 1 ? 50 root 152 20 16032K 16032K run 17565:36 24.94 24.89 +vxfsd 2 ? 17979 ora102st 154 24 3989M 3588K sleep 384:22 4.65 4.65 o +ra_s002_pay9 0 ? 2553 ora102st 154 24 3991M 5520K sleep 3308:08 3.47 3.47 o +ra_s000_pay9 3 ? 6585 root 154 24 250M 63652K sleep 10144:59 3.27 3.27 +ucsrvwp 1 ? 17085 ora102st 154 24 3989M 5340K sleep 218:26 2.73 2.72 o +ra_s001_pay9 System: hpnclass Thu Feb 12 16:36:07 2009 Load averages: 0.52, 0.50, 0.50 520 processes: 461 sleeping, 59 running
The last six lines of the data (the job information) is data that I do not need. Any suggestions as to how I should proceed with this would be appreciated.

TStanley
--------
People sleep peaceably in their beds at night only because rough men stand ready to do violence on their behalf. -- George Orwell

Comment on Parsing a non-formatted record text file
Download Code
Re: Parsing a non-formatted record text file
by kennethk (Monsignor) on Feb 16, 2009 at 15:40 UTC
    Is there anything in particular you're hoping to extract? A cursory look suggests a regex along the lines of /^System\:/ could be used to identify heads of records even if the machine name varies. Another choice might be to split records with 'System:' by setting $/ and work from there.
Re: Parsing a non-formatted record text file
by Fletch (Chancellor) on Feb 16, 2009 at 15:44 UTC

    Rather than try and solve that (admittedly harder) problem, consider installing GTop (and libgtop which it sits upon, which I believe works on hpux) and Proc::ProcessTable and use those to obtain the same information directly.

    The cake is a lie.
    The cake is a lie.
    The cake is a lie.

Re: Parsing a non-formatted record text file
by ELISHEVA (Prior) on Feb 16, 2009 at 15:45 UTC

    I note that lines that need to be parsed alike have distinctive beginnings, so I would use that as a starting point. The basic strategy is:

    • identify a regex that matches the beginning of each parse-alike group of lines uniquely
    • define functions parse each type of line
    • in a while loop
      1. read a line
      2. match it to one of the regexs
      3. call the function associated with that regex

    For an example of how it is done, see Re: Regular expression. You might also find it helpful to look at the larger discussion of which that node was a part: Regular expression. There are several different approaches in that thread.

    Best, beth

Re: Parsing a non-formatted record text file
by Narveson (Chaplain) on Feb 16, 2009 at 16:30 UTC

    Your boss doesn't seem to have given you a formal statement of requirements. That's good news for you: it means you're needed.

    Take a reasonable sample of your data (perhaps slightly larger than you posted here) and manually transcribe it as a CSV file. You will start to notice you have some choices to make.

    Do you want to keep all the data you see? You've told us you don't. So you probably have a finite list of fields to be recorded. Put that list in your column headings.

    A first approximation to your column headings might be Timestamp, LoadAvg1, LoadAvg2, LoadAvg3, SleepingProcesses, RunningProcesses and whatever else you know to be of interest.

    Alternatively, your boss may want a taller, narrower list containing dated name-value pairs. Your guess is better than mine.

    Give the boss your manually transcribed CSV file and ask "Is this what you need?"

    If the answer is yes, the rest is a simple matter of programming.

Re: Parsing a non-formatted record text file
by johngg (Abbot) on Feb 16, 2009 at 16:44 UTC

    If I've understood correctly, something along these lines might break your data into the records you need.

    use strict; use warnings; my @records = (); my $inRecord = 0; my $rxRecStart = qr{^System:}; my $rxRecStop = qr{^CPU\sTTY}; my $recordStr = q{}; while( <DATA> ) { next if m{^\s*$}; if( m{$rxRecStart} ) { $inRecord = 1; push @records, $recordStr if $recordStr; $recordStr = $_; } elsif( m{$rxRecStop} ) { $inRecord = 0; push @records, $recordStr if $recordStr; $recordStr = q{}; } else { $recordStr .= $_ if $inRecord; } } foreach my $record ( @records ) { print $record, q{+} x 50, qq{\n}; } __END__ System: hpnclass Thu Feb 12 16:31:02 2009 Load averages: 0.51, 0.45, 0.49 541 processes: 483 sleeping, 58 running Cpu states: (avg) LOAD USER NICE SYS IDLE BLOCK SWAIT INTR SSYS 0.51 14.7% 3.0% 18.3% 63.9% 0.0% 0.0% 0.0% 0.0% Memory: 13706264K (11030248K) real, 18023572K (14848696K) virtual, 216 +916K free Page# 1/109 CPU TTY PID USERNAME PRI NI SIZE RES STATE TIME %WCPU %CPU C +OMMAND 1 ? 50 root 152 20 16032K 16032K run 17565:36 24.94 24.89 +vxfsd 2 ? 17979 ora102st 154 24 3989M 3588K sleep 384:22 4.65 4.65 o +ra_s002_pay9 0 ? 2553 ora102st 154 24 3991M 5520K sleep 3308:08 3.47 3.47 o +ra_s000_pay9 3 ? 6585 root 154 24 250M 63652K sleep 10144:59 3.27 3.27 +ucsrvwp 1 ? 17085 ora102st 154 24 3989M 5340K sleep 218:26 2.73 2.72 o +ra_s001_pay9 System: hpnclass Thu Feb 12 16:36:07 2009 Load averages: 0.52, 0.50, 0.50 520 processes: 461 sleeping, 59 running Cpu states: (avg) LOAD USER NICE SYS IDLE BLOCK SWAIT INTR SSYS 0.51 14.7% 3.0% 18.3% 63.9% 0.0% 0.0% 0.0% 0.0% Memory: 13706264K (11030248K) real, 18023572K (14848696K) virtual, 216 +916K free Page# 1/109 CPU TTY PID USERNAME PRI NI SIZE RES STATE TIME %WCPU %CPU C +OMMAND 1 ? 50 root 152 20 16032K 16032K run 17565:36 24.94 24.89 +vxfsd 2 ? 17979 ora102st 154 24 3989M 3588K sleep 384:22 4.65 4.65 o +ra_s002_pay9 0 ? 2553 ora102st 154 24 3991M 5520K sleep 3308:08 3.47 3.47 o +ra_s000_pay9 3 ? 6585 root 154 24 250M 63652K sleep 10144:59 3.27 3.27 +ucsrvwp 1 ? 17085 ora102st 154 24 3989M 5340K sleep 218:26 2.73 2.72 o +ra_s001_pay9

    The output.

    System: hpnclass Thu Feb 12 16:31:02 2009 Load averages: 0.51, 0.45, 0.49 541 processes: 483 sleeping, 58 running Cpu states: (avg) LOAD USER NICE SYS IDLE BLOCK SWAIT INTR SSYS 0.51 14.7% 3.0% 18.3% 63.9% 0.0% 0.0% 0.0% 0.0% Memory: 13706264K (11030248K) real, 18023572K (14848696K) virtual, 216 +916K free Page# 1/109 ++++++++++++++++++++++++++++++++++++++++++++++++++ System: hpnclass Thu Feb 12 16:36:07 2009 Load averages: 0.52, 0.50, 0.50 520 processes: 461 sleeping, 59 running Cpu states: (avg) LOAD USER NICE SYS IDLE BLOCK SWAIT INTR SSYS 0.51 14.7% 3.0% 18.3% 63.9% 0.0% 0.0% 0.0% 0.0% Memory: 13706264K (11030248K) real, 18023572K (14848696K) virtual, 216 +916K free Page# 1/109 ++++++++++++++++++++++++++++++++++++++++++++++++++

    I hope this is of use.

    Cheers,

    JohnGG

    Update: Corrected logic error in first if clause which was stripping the first line of the data wanted.

      Thank you. This helped me immensely, by getting rid of the data that I didn't need, and by putting in a record separator, it makes it easier for me to work with.

      TStanley
      --------
      People sleep peaceably in their beds at night only because rough men stand ready to do violence on their behalf. -- George Orwell
Re: Parsing a non-formatted record text file
by leocharre (Priest) on Feb 17, 2009 at 15:13 UTC

    It's kinda weird to do this with top. Especially putting to a csv? This stuff is changing all the time- by default ordered by usage, likely- so just because something is not visible, does not mean a second later it didn't abuse your resources. Maybe ps aux might be of use here, to actually show you what's going on, compare output of that with same output a few mins later.. who knows.

    What might be more useful is - what the heck are they trying to do? See what users are hogging resources?

    If you want to monitor usage, cpu hit.. If a server is grinding to a hault.. etc .. Consider NAGIOS.

    You can code plugins for nagios with perl to do more specific alerts for your org.

    If you haven't a working nagios system, it would be in your best interest.
    (Dunno. If you just want to do your 9-5... But there might be something off here. Coding to save top output to a db.. sounds like someone's not making the most of posix- like.. keeping a ferrari in second gear.)

Re: Parsing a non-formatted record text file
by TStanley (Canon) on Feb 17, 2009 at 16:29 UTC
    Here is the final code:
    #!C:\perl\bin\perl -w use strict; my @records = (); my $inrecord = 0; my $rxRecStart = qr{^System:}; my $rxRecStop = qr{^CPU\sTTY}; my $recordstring = q{}; my ($system,$date,$time,$procsleep,$procrun,$null,$load,$user,$nice,$s +ys,$idle,$block,$swait,$intr,$ssys,$real,$virtual,$free); my $IRS = q/+/ x 50; my @record =(); open INPUTFILE,"cpulist.txt"||die "Can not open: $!\n"; open TEMPOUT,'>',"tempout.txt"||die "Can not open: $!\n"; while (<INPUTFILE>){ next if m{^\s*$}; if(m{$rxRecStart}){ $inrecord = 1; push @records,$recordstring if $recordstring; $recordstring = $_; }elsif(m{$rxRecStop}){ $inrecord = 0; push @records, $recordstring if $recordstring; $recordstring = q{}; }else{ $recordstring .= $_ if $inrecord; } } close INPUTFILE; foreach my $record (@records){ print TEMPOUT $record, q{+} x 50, qq{\n}; } close TEMPOUT; $/ = "$IRS\n"; open TEMPOUT,"tempout.txt"||die"Can not open: $!\n"; open CSVOUTFILE,'>',"output.csv"||die"Can not open output.csv: $!\n"; print CSVOUTFILE "System,Date,Time,Processes Sleeping,Processes Runnin +g,Load,User,Nice,Sys,Idle,Blocking,SWait,Intr,SSys,Real Memory(K),Vir +tual Memory(K),Free Memory(K)\n"; while(<TEMPOUT>){ chomp; @record = split /\n/; if($record[0]=~m/^System:\s+(\w+)\s+\w{3}\s+(\w{3}\s+\d{1,2})\s+(\d{ +1,2}:\d{1,2}:\d{1,2})\s+\d{4}$/){ $system = $1; $date = $2; $time = $3; } if($record[2]=~m/^\d+\s+processes:\s+(\d+)\s+sleeping,\s+(\d+)\s+run +ning$/){ $procsleep = $1; $procrun = $2; } ($null,$load,$user,$nice,$sys,$idle,$block,$swait,$intr,$ssys) = spl +it /\s+/,$record[5]; if($record[6]=~m/^Memory:\s+(\d+)K\s+\(\d+K\)\s+real,\s+(\d+)K\s+\(\ +d+K\)\s+virtual,\s+(\d+)K\s+free(.*)$/){ $real = $1; $virtual = $2; $free = $3; } print CSVOUTFILE "$system,$date,$time,$procsleep,$procrun,$load,$use +r,$nice,$sys,$idle,$block,$swait,$intr,$ssys,$real,$virtual,$free\n"; } close TEMPOUT; close CSVOUTFILE;

    TStanley
    --------
    People sleep peaceably in their beds at night only because rough men stand ready to do violence on their behalf. -- George Orwell

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://744094]
Approved by Corion
Front-paged by Arunbear
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others about the Monastery: (10)
As of 2014-08-29 16:56 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The best computer themed movie is:











    Results (282 votes), past polls