Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses

Multiple session log extraction from a single file problem

by dmtelf (Beadle)
on Aug 06, 2000 at 16:16 UTC ( #26422=perlquestion: print w/replies, xml ) Need Help??

dmtelf has asked for the wisdom of the Perl Monks concerning the following question:

The sample servlog.txt file below (in the code tags) shows 2 complete session logs followed 1 incomplete session log.

Session start/end dividers are *****

A complete session log represents data between two ***** lines inclusive.
(i.e. has ***** followed by Server Initialised line then some data then Server Closed line then *****)

An incomplete session log represents data from one ***** line onwards to the end of file.
(i.e. has ***** followed by Server Initialised line then some data but no Server Closed line or *****)

I need to pull each session log (inc. *****'s) into an array - @session1,@session2 etc.

I also need another array which has something along the lines of:
("Session1 - complete","Session2 - complete","Session3 - incomplete")

How can I do this, O wise log parsing Monks?

***** Server Initialised at 16:35PM, Monday, March 13, 2000 .. data .. Server Closed at 12:56AM, Monday, April 3, 2000 ***** ***** Server Initialised at 13:44PM, Monday, April 3, 2000 .. data .. Server Closed at 04:56AM, Weds, April 7, 2000 ***** ***** Server Initialised at 13:44PM, Monday, April 3, 2000 .. data ..

Replies are listed 'Best First'.
Re: Multiple session log extraction from a single file problem
by merlyn (Sage) on Aug 06, 2000 at 19:50 UTC
    The biggest problem in your parsing is recognizing the end of one run, because in your specification, a line with stars may either be the end of the current session, or perhaps the beginning of the next one because the current session was not ended properly. So any algorithm that doesn't reinterpret the meaning of a star-line in the context of the following line is doomed to fail.

    This smells like a perfect job for Parse::RecDescent. The grammar will look something like (warning: UNTESTED):

    file: report(s?) /\Z/ { return $item[1] } report: complete_report | incomplete_report complete_report: star_line server_started data_line(s?) server_closed +star_line { return ["complete:", @item[2,3,4]] } incomplete_report: star_line server_started data_line(s?) server_clos +ed(?) { return ["incomplete:", @item[2,3]] } star_line: "*****\n" server_started: "Server Started" /.*\n/ { "@item[1, 2]" } server_closed: "Server Closed" /.*\n/ { "@[item[1, 2]" } data_line: ...!(star_line | server_started | server_closed) /.*\n/
    The result will be an array ref like:
    [ ["complete:", "Server Started Monday", ["data1", "data2", "data3"], +"Server Closed Thursday"], ["complete:", "Server Started Tuesday", ["data1", "data2", "data3"], + "Server Closed Thursday"], ["complete:", "Server Started Wednesday", ["data1", "data2", "data3" +], "Server Closed Thursday"], ["incomplete:", "Server Started Monday", ["data1", "data2", "data3"] +], ["complete:", "Server Started Monday", ["data1", "data2", "data3"], +"Server Closed Thursday"], ]
    Hopefully, you can read up enough on Parse::RecDescent to figure out how to use this grammar and invoke it. If I get time, I'll write this up completely and repost it. Actually, it looks like a nice potential future Linux Magazine article. Thanks for the idea! </code>

    -- Randal L. Schwartz, Perl hacker

Re: Multiple session log extraction from a single file problem
by tilly (Archbishop) on Aug 06, 2000 at 18:50 UTC
    Sorry for taking so long. (Doing other things, laundry breakfast, you know how it is.) Anyways here is an incremental solution:
    package Service::ParseLog; use strict; use Carp; use Symbol qw/gensym/; # Takes the log filename and opens for parsing sub new { my $class = shift; my $file = shift or croak("No filename passed"); my $obj = {}; $obj->{file} = $file; my $fh = &gensym(); open ($fh, "<$file") or confess("Cannot read $file: $!"); $obj->{fh} = $fh; $obj->{line_no} = 0; return bless $obj, $class; } # Reads the next service section. In array context it # the lines in the service. In scalar it returns whether # there is more in the file. The service is in the # last_service field. The is_parsing field indicates # whether it found the end of the last service it started. sub read_service { my $self = shift; my $fh = $self->{fh}; my $line_no = $self->{line_no}; # Find the *'s for the service if (<$fh>) { ++$line_no; if (/^\*{5}/) { $self->{is_parsing} = 1; } else { confess("No next service at $line_no in $self->{file}"); } } else { # EOF undef($self->{last_service}); return 0; } # Grab a service section and return it my @service; $self->{is_parsing} = 1; while (<$fh>) { ++$line_no; if (/^\*{5}/) { # End of service $self->{is_parsing} = 0; last; } else { push @service, $_; } } $self->{line_no} = $line_no; $self->{last_section} = \@service; return wantarray ? @service : !$self->{is_parsing}; }
    (If you want, stick a 1; and the end and make it into a module.)

    How would you use it? Well like this:

    my $log = new Service::ParseLog("servlog.txt"); while ($log->read_service()) { my @lines = @{$log->{last_service}}; # do stuff } if ($log->{is_parsing}) { # Incomplete last service }
    Note that I did a fair amount of validation. If your format is not exactly what you described you could have some problems. The reported errors should be informative enough to figure out what is wrong though, just change the tests.

    And /tell tilly if you have any problems with this. :-)

RE: Multiple session log extraction from a single file problem
by nuance (Hermit) on Aug 06, 2000 at 19:11 UTC
    I would localise the $/ variable and set it to the string that splits your log entries "*****\n*****". If you then read from the file in a scalar context it will give you one complete log entry.

    You then need to remove the string you split the file on ("*****\n*****") and split the remainder on a new line "\n". Place the result of this split in an array. If the last element of this array starts with "Server Closed", then you have a complete log entry.

    #!/usr/bin/perl -w use strict; my @status; { my ($log, @split_log, $status); local $/ = qq{*****\n*****}; while ($log = <DATA>) { $log =~ s/\Q*****\E\n\Q*****\E$//; @split_log = split "\n", $log; $status = ($split_log[$#split_log] =~ m/^Server Closed/) ? "complete\n" : "incomplete\n"; push @status, $status; } } __DATA__ ***** Server Started Monday data1 data2 Server Closed Tuesday ***** ***** Server Started Wednesday data3 data4 Server Closed Friday ***** ***** Server Started data5

    This leaves the string "*****" on the front of the first log entry. You can then do whatever processing you need on the logs either one complete log at a time, or instead of assigning the split to @split_log, create an anonymous array and push that onto @split_log. This will let you defer your processing to the end.


Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://26422]
Approved by root
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others contemplating the Monastery: (2)
As of 2022-05-27 00:56 GMT
Find Nodes?
    Voting Booth?
    Do you prefer to work remotely?

    Results (94 votes). Check out past polls.