http://www.perlmonks.org?node_id=598991


in reply to Untangling Log Files

If the requirement is continuous, you'll need some kind of daemon (perhaps invoked at system startup) to pick up new appendages to the logfiles shortly after they arrive. Lets also assume that messages have a timestamp, otherwise duplicate events separated only in time would be indistinguishable.

To allow for reboot of the system, the daemon will need to keep track of the timestamp of the last message it collated for each machine writing messages to the logfiles (in case their clocks are out of synch.)

There also needs to be a structure of regular expressions that enables not just identification of the originating process but of the timestamp which needs to be converted into a delta time for comparison. In a dynamic environment this might best be achieved using a csv configuration file e.g.:

PROCESS,HOST,LOGFILE,FORMAT,$1,$2 foo,host99,/var/adm/foo.log,\s+\S+\s+(\S+)\s+(\d+\-\d+\-\d+\s\d+:\d+:\ +d+:\s\w{2}),PROC,TIMESTAMP
Once all that is sorted out there still remains the routine work for the daemon of reading in the config file, reading in the timestamp tracker file (one line per host), for each file (only one filehandle needed!) matching lines of logfiles against the configured regexps and ignoring entries prior to the timestamp for the host, updating the per-process file and the journal file with the latest timestamp (plus originating host) of a message just transferred to the per-process file.

It also needs to sleep perhaps five minutes between cycles through all the log files to free system resources for other processes.

Update: a common practice is also to routinely archive and delete logfiles (yet another logfile management daemon!) so that such reprocessing doesn't have to start from the beginning of a very large logfile, and then have to read but ignore millions of entries occurring before the last recorded timestamp. One system I work with regularly archives logfiles when they hit 5 MB instead of by time or line count. It might be convenient for your requirement if the message-collating daemon could also (per cycle) check the size and conditionally do or invoke that archiving itself.

-M

Free your mind