http://www.perlmonks.org?node_id=469093

ministry has asked for the wisdom of the Perl Monks concerning the following question:

Monks,

Here's my delima; im working on a script that uses a handy piece of code from the cookbook that more or less does a tail (-f) on my logfile. From there I process each line of the log as it comes in, for various reporting. My question is, how do I efficently remove lines from this logfile; for example if they match one of my regex's?

I have tried using the Tie::File module, but after some initial testing it does not seem to be the most effective means of accomplishing my goal. The log file I am processing grows very quickly, and it didnt keep up as well as I believe it should. My second idea was to use the truncate function and remove the line this way, however I have not been able to get it to work properly. My output is always the first line of my pattern match, then the rest of the file is truncated.

...if regex... seek(LOG,0,0) || die "Seek error: $!"; ...process line... truncate(LOG,tell(LOG))

Im sure there is something simple I am missing - or perhaps there's a better way to go about accomplishing this task. Any feedback would be greatly appreciated.

Cheers, Ev

Good judgement comes with experience. Unfortunately, the experience usually comes from bad judgement.

Replies are listed 'Best First'.
Re: removing lines from a file
by polettix (Vicar) on Jun 22, 2005 at 17:46 UTC
    Two writers on the same file seem a Bad Thing to me if there is not cohordination between them. And you basically can't tell the application to stop while you write, or be sure of a time slice to do your job (because the file is growing fast, as you say).

    You probably need a filter that does not work in-place. You should open another output file, write all lines that you think are good to it and eventually get rid of the original log file when you're sure that you can do it. In this way, you won't be leaving space to data lost (e.g. when you truncate the file just after the application wrote such an important log line, before you had the possibility to read it).

    Another option would be using a pipe to do the job: the application writes data to the pipe, you read it and do your filtering. This requires either that you're able to divert log messages to standard error (for example), or that you use a named pipe. The first option could be tricky (maybe standard error is already used), and the second one requires extreme attention because if you don't open the named pipe for reading your application will hang. I can try to elaborate once you show some interest into these options.

    Flavio
    perl -ple'$_=reverse' <<<ti.xittelop@oivalf

    Don't fool yourself.
Re: removing lines from a file
by rev_1318 (Chaplain) on Jun 22, 2005 at 18:33 UTC
    If you're on a unix-like system, all your problems could be solved if you can replace your logfile by a fifo pipe. Then you can have the application that produces the logfile write to it and comfortably read from it. After processing, you can do with the lines as you please (write them to an additional (real) logfile or discard them...)

    Paul

      You have some alternatives for this task, but have to see this points:
      - will the log file be rotated?
      - can the application is writing the log write to a fifo?

      If you do the fifo alternative, be careful to some log rotate application DO NOT rotate the fifo. Using the fifo you can easily dismiss the lines you don't want to store. If the application can output to STDOUT, you can redirect STDOUT to SDERR and read it (as STDIN is buffered) and then dismiss all lines you don't bother.

        A program doesn't have to know if it's writing to a FIFO or not, unless you're dealing with networked files, i.e. files on some NFS or SMB like share. The only issue I can see here is that opening a FIFO for writing usually requires a listener, so if your filter application crashes (or you don't start it) your program is likely to have issues (some SIGPIPE in the first case, hanging in the second). Another possible issue deals with the limited buffer between these applications, so you must ensure that your filter program doesn't lose time possibly blocking the log producer.

        The FIFO solution leaves the duty to write the logs to the Perl filter, not to the original application, and the filter is likely to write regular files - something that a logrotate program should not be upset with.

        As for the redirection, I don't really understand how it should work. Redirecting STDOUT to STDERR means that you basically lose them all in the listener application. The pipeline

        producer | consumer
        links producer's STDOUT to consumer's STDIN, so the suggested redirection leaves you with an empty STDIN and nowhere to read log lines from.

        Flavio
        perl -ple'$_=reverse' <<<ti.xittelop@oivalf

        Don't fool yourself.
        Writing to a fifo should be no problem. For the application, it's like writing to a ordinary file. But the log rotation is a good point. There you should be carefull indeed.

        Paul

Re: removing lines from a file
by g0n (Priest) on Jun 22, 2005 at 17:30 UTC
    ...if regex... seek(LOG,0,0) || die "Seek error: $!"; ...process line... truncate(LOG,tell(LOG))

    Unless I'm missing something, you've moved the filehandle position to the beginning with seek. When you subsequently truncate to size tell(LOG) it truncates it to zero size, because 0 is where you've seeked to.

    Try setting the position to a variable first:

    ...if regex... $pos = tell(LOG) seek(LOG,0,0) || die "Seek error: $!"; ...process line... truncate(LOG,$pos)

    That corrects the immediate problem, but will still leave the line in question in place. If you set a variable to the string length of the line you've matched, and

    truncate(LOG,$pos-$stringlength);

    that should get rid of your matched line.

    Note: there may be better ways to do this.

    Update: I also strongly agree with the first line of frodo72s response, especially as you've said that the file is growing quickly. I would be inclined to use two files, and write everything that doesn't match your regex out to the second file

    --------------------------------------------------------------

    g0n, backpropagated monk