Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Extract new lines from log file

by smithers (Friar)
on Jan 02, 2007 at 19:41 UTC ( [id://592620] : perlquestion . print w/replies, xml ) Need Help??

smithers has asked for the wisdom of the Perl Monks concerning the following question:

Looking for direction and opinions. Background: I need to monitor 100s of application log files on approx 100 Windows 2000/2003 servers. Frequency of monitoring ranges from hourly to daily. Standard functionality in Perl 5.8 is working great from a single centralized server using Windows UNC file pathing to other local servers and log files. Note: monitoring the logs from a centralized server saves me time by eliminating the need for change-management plans to add scripts or perl binaries to 100 validated production server.

Issue: I need to expand my monitoring to include numerous remote servers -- some accessed via slow or bandwidth-impaired links. My problem is not the large remote log file, per se, as only a few new lines are appended hourly or daily. Rather my approach for extracting the new lines from the large log files seems to suck. My current logic to get new lines is:

1) if file modification date has changed, open file, count number of lines and close. 2) if newly-obtained line count differs from last line count, reopen file. 3) read past and ignore old lines. 4) read new lines and analyze patterns. 5) persist new file line count and mod date for next analysis.

This dual read (once for line count, another to get the new lines) is where all my script CPU and wall time is spent and I could obviously try to combine steps 1 - 4 into a single journey through the file. However, before I do that I thought I would ask for suggestions. Is there a better way to periodically extract the new lines from a log file? Again, with the constraint that I not deploy any scripts or perl distros to the local or remote servers where the logs reside?

Thanks for sharing any ideas you may have.

Replies are listed 'Best First'.
Re: Extract new lines from log file
by BrowserUk (Patriarch) on Jan 02, 2007 at 19:55 UTC

    Why not store the filesize instead of the line count? Then your 5 steps become

    1. Query filesize
    2. If the file has grown since last time, open the file and seek to the previous eof.
    3. Read the new lines and persist the new filesize.

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Extract new lines from log file
by ferreira (Chaplain) on Jan 02, 2007 at 20:00 UTC
    Is there a better way to periodically extract the new lines from a log file?

    I did it once and the algorithm was very similar to the one you described. The only thing that changed was that instead of saving the line count, I save the return of tell (that is, the offset of the file at the last read). This way, when you decide the file has changed and read the offset where you stopped, you just seek and go on from that place. I think you're gonna be quite pleased with the performance after this straightforward change.

Re: Extract new lines from log file
by smithers (Friar) on Jan 02, 2007 at 20:22 UTC
    Thank you BrowserUk and ferreira. These are the helpful suggestions.
      You are so right! SEEKing vs. line-by-line processing is wicked fast. For example test.pl script is a quick test script to determine timing to read the last 1024 bytes from the end of a 4GB file located on a remote server.

      use strict; my $filename = shift; my ( $curpos, $charsread, $buffer ); open(LOGFILE, "<", $filename) or die "can't open file: $!"; # Read last 1024 bytes of data. Test this via # read to -1024 bytes from EOF. seek(LOGFILE, -1024, 2); $curpos = tell(LOGFILE); $charsread = read( LOGFILE, $buffer, 1024 ); print "Chars Read: $charsread\n"; print "Buffer: >$buffer<\n"; close(LOGFILE);

      I ran as shown in next paragraph replacing my actual server name with "foo". I suspect most of you work with Perl on Unix but I'm loving Perl on Win32 and when I can use UNC file pathing as shown below it just tickles me that Perl is Win32 frienly in this way. Anyway, I digress. The result of below script run is the last 1024 bytes of the file is sub-second time. This is sooooooooooo fast! Thank you!

      test.pl \\foo\D$\MSSQL\BACKUP\DW\DW1_db_200701041904.BAK
Re: Extract new lines from log file
by pileofrogs (Priest) on Jan 02, 2007 at 23:17 UTC

    First, using the file size, as recommended above probably just solves your problem, so I'm just muttering academically here...

    It sounds like you've got the remote files mounted on a network drive of some kind, right? Depending on the file system and weather or not it caches, your script might actually be downloading the entire file every time it reads it.

    It might be easier to copy the file to a local partition and then process it.

    -Pileofrogs