Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

Parsing a specific section of a log file

by vxp (Pilgrim)
on Feb 28, 2007 at 21:42 UTC ( #602605=perlquestion: print w/ replies, xml ) Need Help??
vxp has asked for the wisdom of the Perl Monks concerning the following question:

I've a simple ASCII log file. it's got a bunch of lines like this:
2007-02-28 15:24:52,374 INFO [MMSRequestSender] Found 0 items to send + to Media Manager 2007-02-28 15:24:54,221 INFO [MMSResponseListener] MOVING RECORD WITH + OCCURRENCEID: 382228162 TO MEDIAREQUESTQHISTORY. STATUS = FAILURE 2007-02-28 15:24:55,209 INFO [MMSResponseListener] MOVING RECORD WITH + OCCURRENCEID: 382228170 TO MEDIAREQUESTQHISTORY. STATUS = FAILURE 2007-02-28 15:24:56,362 INFO [MMSResponseListener] MOVING RECORD WITH + OCCURRENCEID: 382169066 TO MEDIAREQUESTQHISTORY. STATUS = FAILURE 2007-02-28 15:24:57,496 INFO [MMSResponseListener] MOVING RECORD WITH + OCCURRENCEID: 382221366 TO MEDIAREQUESTQHISTORY. STATUS = FAILURE 2007-02-28 15:24:58,540 INFO [MMSResponseListener] MOVING RECORD WITH + OCCURRENCEID: 382208089 TO MEDIAREQUESTQHISTORY. STATUS = FAILURE 2007-02-28 15:24:59,629 INFO [MMSResponseListener] MOVING RECORD WITH + OCCURRENCEID: 382208089 TO MEDIAREQUESTQHISTORY. STATUS = FAILURE 2007-02-28 15:25:00,933 INFO [MMSResponseListener] MOVING RECORD WITH + OCCURRENCEID: 382237543 TO MEDIAREQUESTQHISTORY. STATUS = FAILURE 2007-02-28 15:25:04,037 INFO [MMSResponseListener] MOVING RECORD WITH + OCCURRENCEID: 381722093 TO MEDIAREQUESTQHISTORY. STATUS = FAILURE 2007-02-28 15:25:06,987 INFO [MMSResponseListener] MOVING RECORD WITH + OCCURRENCEID: 381722093 TO MEDIAREQUESTQHISTORY. STATUS = FAILURE 2007-02-28 15:25:41,545 INFO [MMSResponseListener] MOVING RECORD WITH + OCCURRENCEID: 382246716 TO MEDIAREQUESTQHISTORY. STATUS = SUCCESS 2007-02-28 15:25:52,375 INFO [MMSRequestSender] Found 4 items to send + to Media Manager 2007-02-28 15:26:08,050 INFO [MMSResponseListener] MOVING RECORD WITH + OCCURRENCEID: 382237436 TO MEDIAREQUESTQHISTORY. STATUS = SUCCESS
What I need to do is parse that log file every 5 minutes and find out if within the past 30 minutes there was at least 1 STATUS = SUCCESS line, and at least 1 line containing a non-zero value in a "Found 0 items to send to Media Manager" string. I can't really think of a good way / logic to parse specifically last 30 minutes of that log file.. My logic goes something like: get current date at the time the script's ran subtract 30 minutes from it, and find the line in the log file that corresponds to that timestamp .... how do i go forward on from there? i know regular expressions, thats not the problem, i just dont know how to tell the script to start looking only from that 1 line forward Any ideas ? :)

Comment on Parsing a specific section of a log file
Download Code
Re: Parsing a specific section of a log file
by jettero (Monsignor) on Feb 28, 2007 at 21:49 UTC

    Personally, for tasks like this, I usually use File::ReadBackwards and Time::Local qw(timelocal_nocheck) to quickly read only the recent lines (and stop reading when things get) $too_old.

    In fact, I learned about Time::Local at this site, asking a related question: Fast date parsing.

    -Paul

      Wow, I didn't know about File::ReadBackwards! I'll have to try that. Here's what I've been doing to parse an Apache error log at work that's never been rotated. I use a start date and an end date, but I think the same principle should apply to times.
      use Tie::File; tie my @lines, 'Tie::File', 'error.log'; LINE: for (my $i=$#lines; $i >= 0; $i--) { my $d = substr($lines[$i], 1, 24); my $file_date = $time{"yyyy-mm-dd", ParseDate($d)}; next LINE if ($file_date gt $edate); last LINE if ($file_date lt $sdate); # process here }

        The reason I suggested Time::Local, though: if your dates are all identicaly shapped and speed is an issue (which it may be at 5 minute intervals) then it's like a million times faster to use timelocal_nocheck. It's more work too, but it can be worth it in some situations.

        Er, I guessed ParseDate was from Date::Manip.

        -Paul

Re: Parsing a specific section of a log file
by Anno (Deacon) on Feb 28, 2007 at 21:53 UTC
    I don't even know the module, but it sounds like File::ReadBackwards would be of help here. Alternatively, consider Tie::File and a binary search for the time.

    Anno

Re: Parsing a specific section of a log file
by Zaxo (Archbishop) on Feb 28, 2007 at 21:57 UTC

    File::Tail will help with that. You can write a continuously running process which monitors the log and reads new entries at an interval you select.

    Just push the timestamp from success records onto an array, splice out all the timestamps over half an hour old, and warn if the array is empty.

    As with any continuously running service, it would be a good idea to make a Daemon of it.

    After Compline,
    Zaxo

      Another alternative is POE::Wheel::FollowTail. I think it is cleaner for that, and uses the POE framework for getting your job done. For daemonizing it, you can use POE::Component::Daemon.

      Igor 'izut' Sutton
      your code, your rules.

Re: Parsing a specific section of a log file
by ysth (Canon) on Feb 28, 2007 at 22:00 UTC
    I would do something like this (untested):
    use File::ReadBackwards; use POSIX "strftime"; my $success; my $items; my $earliest = POSIX::strftime( "%Y-%m-%d %H:%M:%S", gmtime( time() - +30 * 60 ) ); $bw = File::ReadBackwards->new( 'log_file' ) or die "can't read 'log_file' $!" ; while ( defined( my $log_line = $bw->readline() ) ) { last if $log_line lt $earliest; $success ||= $log_line =~ /STATUS = SUCCESS/; $items ||= $log_line =~ /Found [1-9]\d* items to send/; last if $success && $items; }
Re: Parsing a specific section of a log file
by kyle (Abbot) on Feb 28, 2007 at 22:00 UTC

    Maybe it would be better to have a long-running process watch the log. You can use File::Tail to read the log as it's written and note when a "STATUS = SUCCESS" line happens. When one comes in, set an alarm (alarm 30 * 60), which will reset any earlier alarm you set. When an alarm goes off , you know that 30 minutes have gone by without a success line, and you can take whatever action you like.

Re: Parsing a specific section of a log file
by vxp (Pilgrim) on Feb 28, 2007 at 22:34 UTC
    wow, many ideas, i didn't expect to see so many in such short time! +1 to everyone, from me :D although i'm sure i'll be fine with these - don't be shy and post up m +ore as you come up with them! unfortunately, though, the continuously reading file / have a daemon i +deas aren't good for my particular situation, as this check runs ever +y 5 mins from Hobbit (a monitoring tool, sorta like BigBrother) this is good stuff, though :)
Re: Parsing a specific section of a log file
by f00li5h (Chaplain) on Mar 01, 2007 at 02:59 UTC

    Is there any chance of tampering with the program/script that produces the log, and have it perform whatever task you want this log monitoring script to do at the time those things would be written into the log?

    This does ofcourse assume bundles of things about your situation, but it may still be a little cleaner for you to catch the data before it goes into the log file, so you don't have to drag it out again.

    @_=qw; ask f00li5h to appear and remain for a moment of pretend better than a lifetime;;s;;@_[map hex,split'',B204316D8C2A4516DE];;y/05/os/&print;
      no :)

      its some java thing written by our development dept., i dont think theyd want to change it.. plus not really necessary either..

      i followed suggestions here and here's the solution i came up with (used localtime instead of gmtime, gmtime() didnt return correct time)

      use File::ReadBackwards; use POSIX "strftime"; my $success; my $items; my $earliest = POSIX::strftime( "%Y-%m-%d %H:%M:%S", localtime( time() + - 30 * 60 ) ); $bw = File::ReadBackwards->new( '/var/log/mmsagent/mmsagent.log' ) or die "can't read '/var/log/mmsagent/mmsagent.lo +g' $!" ; while ( defined( my $log_line = $bw->readline() ) ) { last if $log_line lt $earliest; if ($log_line =~ /STATUS = SUCCESS/) { print $log_line; } }

      works perfect, only prints out the relevant lines if they happened within half an hour ago. much thanks to everyone involved for their help!

Re: Parsing a specific section of a log file
by hangon (Deacon) on Mar 01, 2007 at 05:08 UTC

    Here's a possible solution. It involves storing the ending line number from the last six passes in a temp file. I'll leave the coding to you.

    • First read in the line counts from the temp file into @linecounts.
    • Get the starting point: $start = shift @linecounts
    • On the pass through the log file iterate directly to the starting point: for (0..$start) {<FH>}.
    • From there start parsing dates and log entries while incrementing $start with each line.
    • Do whatever if entries found/not found
    • Add new line count to array: push (@linecounts, $start)
    • Rewrite temp file from @linecounts.
    • Repeat in 5 minutes
Re: Parsing a specific section of a log file
by sanPerl (Friar) on Mar 01, 2007 at 06:05 UTC
    I am just assuming few things before providing you my solution.
    Assumption 1:You are running this script on Unix/Linux,
    Assumption 2:The log file gets appended from bottom

    Script can save the latest log file to some other directory after every run. Then script will run system 'diff' command to find out differences between latest log file and your saved log file (Which is the last version of log file). You will get entries, which are latest addition after last execution time of script. Work on those entries and use regex your way.
Re: Parsing a specific section of a log file
by eriam (Beadle) on Mar 01, 2007 at 07:55 UTC
Re: Parsing a specific section of a log file
by Moron (Curate) on Mar 01, 2007 at 10:11 UTC
    If at say 10:25, the situation at 10:20 is known, it obviously can't change five minutes later. Also if the log file is large, the File::Tail will perform better than the implicit costs of closing, reopening and reading backwards every time anyway. The advantage of that one over the POE solution is that you can sleep for 300 seconds in between to minimise consumption of system resources, whereas POE will keep itself somewhat busier, though not as busy as reading backwards every time.

    But it often helps in busier systems to divide very regular processing into small slots by design, so that the next daemon to be implemented has a slot available whose consumption by other processes can reasonably be predicted - that can't be done for the POE solution.

    -M

    Free your mind

      I don't see why you can't do that with POE.

      If you have other processing to do once in a while just add it as an event in your main POE processing daemon (which is fo example tailling a log file up to now) and fire it up as often as you wish.

      Then it will hopefully take full advantage of POE's cooperative behaviour.

      Honestly implementing POE is an effort (small) but as long as you build the daemon from scratch I think it's a good choice (and you don't need cron jobs..).

      Anyway there is more then one way to tail a log file :)
        Event-driven processing is distinct from scheduled processing and consumption distribution can only be managed from scheduled processing. In addition, not every process on a system is in general written in Perl. Nor does one generally have the luxury of rewriting a system just because, say, one believes event-driven processing to be aesthetically better - that is at least how managers tend to view it.

        In my own situation, the only way I can achieve a totally event-driven version of the kind of system I work on is to pay for the development myself.

        -M

        Free your mind

      I use POE::Wheel::FollowTail to parse, analyse and forward events concurrently from two very busy logfiles (each about 100M a day, average 10 significant lines per second, picked out of three or four times that many, peaks way above that). I have POE look at the file every five seconds and the CPU hit is negligable compared to other processes on the box. I guess it queues itself an alarm between checking the file as this particular POE is doing little else (throws an I'm alive event every minute just in case).

      I like the fact the FollowTail handle the log file being rolled over, tells me about errors without choking and generaly just does the right thing. It saved a lot of thinking about all the corner cases (in my case these files are r/o NFS mounts from elswhere and occasionly mis-behave) and let me concentrate on the interesting bit.

      Cheers,
      R.

      Pereant, qui ante nos nostra dixerunt!
Re: Parsing a specific section of a log file
by jethro (Monsignor) on Mar 03, 2007 at 01:32 UTC
    My solution would involve the seek and tell functions to resume every five minutes where you left off the last time. And a small file where you would store file position from tell() and the results of the last 5 chunks of 5(!) minutes of logfile you scanned (lets call them subresults).

    The result of a 30 minutes check would then be the logical AND of the last 6 subresults (five from the file and one from the last scan) for the success line combined with the logical AND of the last 6 subresults for the non-zero value.

    Hope I made sense.

      but the goal is to check every 5 minutes for 30 mins back :)

      so if for example now is 17:15 you want to read the file back to 16:45 and note if theres a string you are interested in

      and at 17:20 you are supposed to check back to the 16:50 timestamp and note if theres at least 1 string in that timeframe

      and if i understood you correctly (perhaps i didnt ? ) your solution would only "run" (the logical AND) every 30 mins

      and to further complicate matters, this script cant run as a daemon/process that continously runs :)

      this is a Hobbit (monitoring tool like BigBrother, Nagios, etc) check, and as such can only run when Hobbit execs it (each 5 mins in this case) :)

        No, my solution would make a logical AND every 5 minutes. I'll elaborate:

        At 17:20 your program would get called and would read in a small file (lets call it /tmp/last). In that file would be stored
        1) a file positon from tell
        2) two boolean values for the time slot 16:50 to 16:55 (note the difference to above)
        3) two boolean values for the time slot 16:55 to 17:00
        ...
        6) two boolean values for the time slot 17:10 to 17:15

        Now your program would open the logfile, seek to the file position from 1), read the logfile from there and calculate the two boolean values (is there a success line? is there a non-zero value ?). This would be the result for time slot 17:15 to 17:20

        Now the program can calculate the endresult for the half hour by ANDing the values from 2) to 6) and the newly calculated values.

        Finally the program overwrites /tmp/last and stores the following into it:
        1) the file position from tell
        2) two boolean values for the time slot 16:55 to 17:00
        2) two boolean values for the time slot 17:00 to 17:05
        ...
        6) the newly calculated boolean values for the time slot 17:15 to 17:20

        After 5 minutes your program gets called and does all of the above again. The temporary file is like a pipeline storing only the last 5 results, shifting out an old one and getting a new one.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://602605]
Approved by friedo
Front-paged by ysth
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (5)
As of 2014-12-27 09:37 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (176 votes), past polls