Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery
 
PerlMonks  

Man, File::Tail is just beating me down!

by hallikpapa (Scribe)
on Jan 17, 2008 at 16:37 UTC ( #662916=perlquestion: print w/ replies, xml ) Need Help??
hallikpapa has asked for the wisdom of the Perl Monks concerning the following question:

I have this log file, grows to about ~1.5GB a day (starts at 0 length file, and the first tail restart happened when the file was less than 100mb), and there is one file per day. Rolls over at midnight. At 00:01 I restart the process and open the new file. I connect to this machine via IO::Socket and send a tail command using File::Tail The tail is constantly restarting, 6 times the night before, 10 times last night. Here is what I have done which brought me to the conclusion it's the tail: 1) Moved all other traffic off the interface on both end points of the IO::Socket 2) Moved the client side machine behind a brand new switch (previous one had potential problems, so I ruled that out) 3) Ran a bunch of tests to see how many records / bandwidth, utilization, etc being pumped thru this pipe. WELL below the juice running thru most other boxes. 4) Put in a few checks in case there is a restart on the client and server side so it doesn't resend all data from the beginning of the log file, it remembers last line sent thru the socket. So based off of the example and what's in the skimpish documentation for the File::Tail , I wrote this routine that sends the data. If you notice my log warning about the TAIL reset, that's how I know it happened 10 times last night.

$file=File::Tail->new(name=>$filename, maxinterval=>5, interval=>1, ta +il=>-1, ignore_nonexistant => 1, errmode=>\&do_exit); while ( <$sock> ) { until (-e $filename) { sleep 10; } next unless /\S/; # blank line log_notice "Tailing $filename\n"; while ( defined($line=$file->read) ) { if (!defined $first_record) { $first_record = $line; } elsif ($line eq $first_record) { log_warn("TAIL reset at record: ".$record." Msg: " +.$!); $last_sent = $record; $record = 0; } $record++; if ($record > $last_sent) { my @f=(split /;/,$line); if ($f[21] > $max_seq_num) { print $sock $record.";".$cdr_file.";".$thishos +t.";".$line; } else { #log_notice("Skipping seq_num $f[21]"); } } } # End While # 2 &do_term($sock); } # End While sock

NOW...My question is, if I show you glorious Monks the Tail.pm, could anyone help me figure what is causing it to jump back to the beinning of the file? It's killing me. :(
Tail.pm
package File::Tail; use strict; use vars qw($VERSION @ISA @EXPORT @EXPORT_OK); require Exporter; @ISA = qw(Exporter); # Items to export into callers namespace by default. Note: do not expo +rt # names by default without a very good reason. Use EXPORT_OK instead. # Do not simply export all your public functions/methods/constants. $VERSION = '0.99.3'; # Preloaded methods go here. use FileHandle; #use IO::Seekable; # does not define SEEK_SET in 5005.02 use File::stat; use Carp; use Time::HiRes qw ( time sleep ); #import hires microsecond timers sub SEEK_SET () { 0; } sub SEEK_CUR () { 1; } sub SEEK_END () { 2; } sub interval { my $object = shift @_; if (@_) { $object->{interval} = shift; $object->{interval} = $object->{maxinterval} if $object->{interval} > $object->{maxinterval}; } $object->{interval}; } sub logit { my $object = shift; my @call = caller(1); print # STDERR # time()." ". "\033[7m" . $call[3] . " " . $object->{"input"} . " " . join( "", @_ ) . "\033[0m" . "\n" if $object->debug; } sub adjustafter { my $self = shift; $self->{adjustafter} = shift if @_; return $self->{adjustafter}; } sub debug { my $self = shift; $self->{"debug"} = shift if @_; return $self->{"debug"}; } sub errmode { my ( $self, $mode ) = @_; my ($prev) = $self->{errormode}; if ( @_ >= 2 ) { ## Set the error mode. defined $mode or $mode = ''; if ( ref($mode) eq 'CODE' ) { $self->{errormode} = $mode; } elsif ( ref($mode) eq 'ARRAY' ) { unless ( ref( $mode->[0] ) eq 'CODE' ) { croak 'bad errmode: first item in list must be a code +ref'; $mode = 'die'; } $self->{errormode} = $mode; } else { $self->{errormode} = lc $mode; } } $prev; } sub errmsg { my ( $self, @errmsgs ) = @_; my ($prev) = $self->{errormsg}; if ( @_ > 0 ) { $self->{errormsg} = join '', @errmsgs; } $prev; } # end sub errmsg sub error { my ( $self, @errmsg ) = @_; my ( $errmsg, $func, $mode, @args, ); if ( @_ >= 1 ) { ## Put error message in the object. $errmsg = join '', @errmsg; $self->{"errormsg"} = $errmsg; ## Do the error action as described by error mode. $mode = $self->{"errormode"}; if ( ref($mode) eq 'CODE' ) { &$mode($errmsg); return; } elsif ( ref($mode) eq 'ARRAY' ) { ( $func, @args ) = @$mode; &$func(@args); return; } elsif ( $mode eq "return" ) { return; } elsif ( $mode eq "warn" ) { carp $errmsg; } else { # die croak $errmsg; } } else { return $self->{"errormsg"} ne ''; } } # end sub error sub copy { my $self = shift; $self->{copy} = shift if @_; return $self->{copy}; } sub tail { my $self = shift; $self->{"tail"} = shift if @_; return $self->{"tail"}; } sub reset_tail { my $self = shift; $self->{reset_tail} = shift if @_; return $self->{reset_tail}; } sub nowait { my $self = shift; $self->{nowait} = shift if @_; return $self->{nowait}; } sub method { my $self = shift; $self->{method} = shift if @_; return $self->{method}; } sub input { my $self = shift; $self->{input} = shift if @_; return $self->{input}; } sub maxinterval { my $self = shift; $self->{maxinterval} = shift if @_; return $self->{maxinterval}; } sub resetafter { my $self = shift; $self->{resetafter} = shift if @_; return $self->{resetafter}; } sub ignore_nonexistant { my $self = shift; $self->{ignore_nonexistant} = shift if @_; return $self->{ignore_nonexistant}; } sub name_changes { my $self = shift; $self->{name_changes_callback} = shift if @_; return $self->{name_changes_callback}; } sub TIEHANDLE { my $ref = new(@_); } sub READLINE { $_[0]->read(); } sub PRINT { $_[0]->error("PRINT makes no sense in File::Tail"); } sub PRINTF { $_[0]->error("PRINTF makes no sense in File::Tail"); } sub READ { $_[0]->error( "READ not implemented in File::Tail -- use READLINE (<HANDLE>) + instead" ); } sub GETC { $_[0]->error( "GETC not (yet) implemented in File::Tail -- use READLINE (<HANDLE>) i +nstead" ); } sub DESTROY { my ($this) = $_[0]; close( $this->{"handle"} ) if ( defined($this) && defined( $this->{'handle'} ) ); # undef $_[0]; return; } sub CLOSE { &DESTROY(@_); } sub new { my ($pkg) = shift @_; $pkg = ref($pkg) || $pkg; unless ($pkg) { $pkg = "File::Tail"; } my %params; if ( $#_ == 0 ) { $params{"name"} = $_[0]; } else { if ( ( $#_ % 2 ) != 1 ) { croak "Odd number of parameters for new"; return; } %params = @_; } my $object = {}; bless $object, $pkg; unless ( defined( $params{'name'} ) ) { croak "No file name given. Pass filename as \"name\" parameter +"; return; } $object->input( $params{'name'} ); $object->copy( $params{'cname'} ); $object->method( $params{'method'} || "tail" ); $object->{buffer} = ""; $object->maxinterval( $params{'maxinterval'} || 60 ); $object->interval( $params{'interval'} || 10 ); $object->adjustafter( $params{'adjustafter'} || 10 ); $object->errmode( $params{'errmode'} || "die" ); $object->resetafter( $params{'resetafter'} || ( $object->maxinterval * $object->adjustafter ) ); $object->{"debug"} = ( $params{'debug'} || +0 ); $object->{"tail"} = ( $params{'tail'} || +0 ); $object->{"nowait"} = ( $params{'nowait'} || +0 ); $object->{"maxbuf"} = ( $params{'maxbuf'} || +16384 ); $object->{"name_changes_callback"} = ( $params{'name_changes'} || +undef ); if ( defined $params{'reset_tail'} ) { $object->{"reset_tail"} = $params{'reset_tail'}; } else { $object->{"reset_tail"} = -1; } $object->{'ignore_nonexistant'} = ( $params{'ignore_nonexistant'} +|| 0 ); $object->{"lastread"} = 0; $object->{"sleepcount"} = 0; $object->{"lastcheck"} = 0; $object->{"lastreset"} = 0; $object->{"nextcheck"} = time(); if ( $object->{"method"} eq "tail" ) { $object->reset_pointers; } # $object->{curpos}=0; # ADDED 25May01: undef warnings w +hen # $object->{endpos}=0; # starting up on a nonexistant +file return $object; } # Sets position in file when first opened or after that when reset: # Sets {endpos} and {curpos} for current {handle} based on {tail}. # Sets {tail} to value of {reset_tail}; effect is that first call # uses {tail} and subsequent calls use {reset_tail}. sub position { my $object = shift; $object->{"endpos"} = sysseek( $object->{handle}, 0, SEEK_END ); unless ( $object->{"tail"} ) { $object->{endpos} = $object->{curpos} = sysseek( $object->{handle}, 0, SEEK_END ); } elsif ( $object->{"tail"} < 0 ) { $object->{endpos} = sysseek( $object->{handle}, 0, SEEK_END ); $object->{curpos} = sysseek( $object->{handle}, 0, SEEK_SET ); } else { my $crs = 0; my $maxlen = sysseek( $object->{handle}, 0, SEEK_END ); while ( $crs < $object->{"tail"} + 1 ) { my $avlen = length( $object->{"buffer"} ) / ( $crs + 1 ); $avlen = 80 unless $avlen; my $calclen = $avlen * $object->{"tail"}; $calclen += 1024 if $calclen <= length( $object->{"buffer" +} ); $calclen = $maxlen if $calclen > $maxlen; $object->{curpos} = sysseek( $object->{handle}, -$calclen, SEEK_END ); sysread( $object->{handle}, $object->{"buffer"}, $calclen +); $object->{curpos} = sysseek( $object->{handle}, 0, SEEK_CU +R ); $crs = $object->{"buffer"} =~ tr/\n//; last if ( $calclen >= $maxlen ); } $object->{curpos} = sysseek( $object->{handle}, 0, SEEK_CUR ); $object->{endpos} = sysseek( $object->{handle}, 0, SEEK_END ); if ( $crs > $object->{"tail"} ) { my $toskip = $crs - $object->{"tail"}; my $pos; $pos = index( $object->{"buffer"}, "\n" ); while ( --$toskip ) { $pos = index( $object->{"buffer"}, "\n", $pos + 1 ); } $object->{"buffer"} = substr( $object->{"buffer"}, $pos + +1 ); } } $object->{"tail"} = $object->{"reset_tail"}; } # Tries to open or reopen the file; failure is an error unless # {ignore_nonexistant} is set. # # For a new file (ie, first time opened) just does some book-keeping # and calls position for initial position setup. Otherwise does some # checks whether file has been replaced, and if so changes to the new # file. (Calls position for reset setup). # # Always updates {lastreset} to current time. # sub reset_pointers { my $object = shift @_; $object->{lastreset} = time(); my $st; my $oldhandle = $object->{handle}; my $newhandle = FileHandle->new; my $newname; if ( $oldhandle && $$object{'name_changes_callback'} ) { $newname = $$object{'name_changes_callback'}(); } else { $newname = $object->input; } unless ( open( $newhandle, "<$newname" ) ) { if ( $object->{'ignore_nonexistant'} ) { # If we have an oldhandle, leave endpos and curpos to what +they # were, since oldhandle will still be the "current" handle +elsewhere, # eg, checkpending. This also allows tailing a file which +is removed # but still being written to. if ( !$oldhandle ) { $object->{'endpos'} = 0; $object->{'curpos'} = 0; } return; } $object->error( "Error opening " . $object->input . ": $!" ); $object->{'endpos'} = 0 unless defined( $object->{'endpos'} ); $object->{'curpos'} = 0 unless defined( $object->{'curpos'} ); return; } binmode($newhandle); if ( defined($oldhandle) ) { # If file has not been changed since last OK read do not do an +ything $st = stat($newhandle); # lastread uses fractional time, stat doesn't. This can cause +false # negatives. # If the file was changed the same second as it was last read, # we only reopen it if it's length has changed. The alternativ +e is that # sometimes, files would be reopened needlessly, and with rese +t_tail # set to -1, we would see the whole file again. # Of course, if the file was removed the same second as when i +t was # last read, and replaced (within that second) with a file of +equal # length, we're out of luck. I don't see how to fix this. if ( $st->mtime <= int( $object->{'lastread'} ) ) { if ( $st->size == $object->{"curpos"} ) { $object->{lastread} = $st->mtime; return; } else { # will continue further to reset } } else { } $object->{handle} = $newhandle; $object->position; $object->{lastread} = $st->mtime; close($oldhandle); } else { # This is the first time we are opening this file $st = stat($newhandle); $object->{handle} = $newhandle; $object->position; $object->{lastread} = $st->mtime; # for better estimate on in +itial read } } sub checkpending { my $object = shift @_; my $old_lastcheck = $object->{lastcheck}; $object->{"lastcheck"} = time; unless ( $object->{handle} ) { $object->reset_pointers; unless ( $object->{handle} ) { # This try did not open the f +ile either return 0; } } $object->{"endpos"} = sysseek( $object->{handle}, 0, SEEK_END ); if ( $object->{"endpos"} < $object->{curpos} ) { # file was tru +ncated $object->position; } elsif (( $object->{curpos} == $object->{"endpos"} ) && ( time() - $object->{lastread} ) > $object->{'resetafter'} +) { $object->reset_pointers; $object->{"endpos"} = sysseek( $object->{handle}, 0, SEEK_END +); } if ( $object->{"endpos"} - $object->{curpos} ) { sysseek( $object->{handle}, $object->{curpos}, SEEK_SET ); readin( $object, $object->{"endpos"} - $object->{curpos} ); } return ( $object->{"endpos"} - $object->{curpos} ); } sub predict { my $object = shift; my $crs = $object->{"buffer"} =~ tr/\n//; # Count newlines i +n buffer my @call = caller(1); return 0 if $crs; my $ttw = $object->{"nextcheck"} - time(); return $ttw if $ttw > 0; if ( my $len = $object->checkpending ) { readin( $object, $len ); return 0; } if ( $object->{"sleepcount"} > $object->adjustafter ) { $object->{"sleepcount"} = 0; $object->interval( $object->interval * 10 ); } $object->{"sleepcount"}++; $object->{"nextcheck"} = time() + $object->interval; return ( $object->interval ); } sub bitprint { return "undef" unless defined( $_[0] ); return unpack( "b*", $_[0] ); } sub select { my $object = shift @_ if ref( $_[0] ); my ( $timeout, @fds ) = splice( @_, 3 ); $object = $fds[0] unless defined($object); my ( $savein, $saveout, $saveerr ) = @_; my ( $minpred, $mustreturn ); if ( defined($timeout) ) { $minpred = $timeout; $mustreturn = time() + $timeout; } else { $minpred = $fds[0]->predict; } foreach (@fds) { my $val = $_->predict; $minpred = $val if $minpred > $val; } my ( $nfound, $timeleft ); my @retarr; while ( defined($timeout) ? ( !$nfound && ( time() < $mustreturn ) ) : !$nfound ) { # Restore bitmaps in case we called select before splice( @_, 0, 3, $savein, $saveout, $saveerr ); ( $nfound, $timeleft ) = select( $_[0], $_[1], $_[2], $minpred + ); if ( defined($timeout) ) { $minpred = $timeout; } else { $minpred = $fds[0]->predict; } undef @retarr; foreach (@fds) { my $val = $_->predict; $nfound++ unless $val; $minpred = $val if $minpred > $val; push( @retarr, $_ ) unless $val; } } if (wantarray) { return ( $nfound, $timeleft, @retarr ); } else { return $nfound; } } sub readin { my $crs; my ( $object, $len ) = @_; if ( length( $object->{"buffer"} ) ) { # this means the file was reset AND a tail -n was active $crs = $object->{"buffer"} =~ tr/\n//; # Count newlines in +buffer return $crs if $crs; } $len = $object->{"maxbuf"} if ( $len > $object->{"maxbuf"} ); my $nlen = $len; while ( $nlen > 0 ) { $len = sysread( $object->{handle}, $object->{"buffer"}, $nlen, length( $object->{"buffer"} ) ); return 0 if $len == 0; # Some busy filesystems return 0 somet +imes, # and never give anything more from th +en on if # you don't give them time to rest. Th +is return # allows File::Tail to use the usual e +xponential # backoff. $nlen = $nlen - $len; } $object->{curpos} = sysseek( $object->{handle}, 0, SEEK_CUR ); $crs = $object->{"buffer"} =~ tr/\n//; if ($crs) { my $tmp = time; $object->{lastread} = $tmp if $object->{lastread} > $tmp; # +??? $object->interval( ( $tmp - ( $object->{lastread} ) ) / $crs ) +; $object->{lastread} = $tmp; } return ($crs); } sub read { my $object = shift @_; my $len; my $pending = $object->{"endpos"} - $object->{"curpos"}; my $crs = $object->{"buffer"} =~ m/\n/; while ( !$pending && !$crs ) { $object->{"sleepcount"} = 0; while ( $object->predict ) { if ( $object->nowait ) { if (wantarray) { return (); } else { return ""; } } sleep( $object->interval ) if ( $object->interval > 0 ); } $pending = $object->{"endpos"} - $object->{"curpos"}; $crs = $object->{"buffer"} =~ m/\n/; } if ( !length( $object->{"buffer"} ) || index( $object->{"buffer"}, "\n" ) < 0 ) { readin( $object, $pending ); } unless (wantarray) { my $str = substr( $object->{"buffer"}, 0, 1 + index( $object->{"buffer"}, "\n" ) ); $object->{"buffer"} = substr( $object->{"buffer"}, 1 + index( $object->{"buffer"}, + "\n" ) ); return $str; } else { my @str; while ( index( $object->{"buffer"}, "\n" ) > -1 ) { push( @str, substr( $object->{"buffer"}, 0, 1 + index( $object->{"buffer"}, "\n" ) ) ); $object->{"buffer"} = substr( $object->{"buffer"}, 1 + index( $object->{"buffer"}, "\n" ) ); } return @str; } } 1;

Comment on Man, File::Tail is just beating me down!
Select or Download Code
Re: Man, File::Tail is just beating me down!
by gloryhack (Deacon) on Jan 17, 2008 at 18:07 UTC
    I could be so far off that I can't even see left field, but: Is it guaranteed that your logger will always in some way make each new line unique? If there's any way in normal operation for the "current" line to contain the same data as the first line, then your test is invalid.
      It would be impossible. Each line is unique, they come off of a VOIP switch that has many unique fields, like sequence number, call id, etc...
Re: Man, File::Tail is just beating me down!
by BrowserUk (Pope) on Jan 17, 2008 at 19:07 UTC

    I don't understand your code at all. Why are reading from the same socket as you are writing to?

    And what on earth do you mean by:

    and send a tail command using File::Tail

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      while (<$sock>) Just means while the connection is open between the two machines. I send stuff back and forth between the two machines right up until the client machine sends "tail\n" Then all of the data streams across the socket. Sorry about the "send tail using File::Tail" Not very clear. Hope the above clears it up. The client sends that command and the data starts pumpin'
Re: Man, File::Tail is just beating me down!
by runrig (Abbot) on Jan 17, 2008 at 21:02 UTC
Re: Man, File::Tail is just beating me down!
by starbolin (Hermit) on Jan 17, 2008 at 23:52 UTC

    In your code this:

    if (!defined $first_record) { $first_record = $line;

    seems really dangerous as you can't say for certain the contents of $line. Could be a null, a lone line terminator or whatever the last thing was you wrote to the socket. I know you may have convinced yourself there are only valid records but the validation checks need to be done in code and not in the head. Put the validation checks in the code elsewise you're just guessing.


    s//----->\t/;$~="JAPH";s//\r<$~~/;{s|~$~-|-~$~|||s |-$~~|$~~-|||s,<$~~,<~$~,,s,~$~>,$~~>,, $|=1,select$,,$,,$,,1e-1;print;redo}
      It's an excellent point and something I will defintely review, but because this problem has been ongoing, I am logging where it fails, which record, etc...and in some cases have even caught it happening. The problem I have discussed in this thread is seems to be unrelated. When I see a restart, I look at the line it restarted on, and check that line, the previous 10, and even the next 10. All standard...

        Rereading my post I realize that I was very unclear as to what I was seeing. I was assuming only the evidence you had for a restart was the logger script. If I was wrong then please supply any other logs you may have. Seeing that the logger was flawed I assumed that cleaning up the logger would give a clearer picture of what was happening. I saw nothing in what you had posted to convince me it wasn't a logger problem. Are you saying that some problem manifests itself even without the posted logger script running?? You'll have to excuse me in that I was a bit myopic in focusing on the code that was posted.

        Could you, please, post the code in the host machine that calls the suspected module?



        s//----->\t/;$~="JAPH";s//\r<$~~/;{s|~$~-|-~$~|||s |-$~~|$~~-|||s,<$~~,<~$~,,s,~$~>,$~~>,, $|=1,select$,,$,,$,,1e-1;print;redo}
Re: Man, File::Tail is just beating me down!
by matija (Priest) on Jan 20, 2008 at 14:03 UTC

    First of all, File::Tail is on CPAN - there is no need to post it all here.

    Having said that, I think you are running into some kind of a timing problem. File::Tail tries to be clever about detecting when a file has been rotated, and it seems that, under very heavy loads, it sometimes gets a false positive (it thinks that the file has been rotated when it really has not).

    I've never been able to generate a reliable test case that would trigger that bug so I could debug it.

    My advice is:

    • See if bumping maxinterval to a higher value helps
    • See if bumping adjustfter to a higher value helps
    • (may not be feasible for your application) put 0 or a small positive value into reset_tail - that will make it restart from the end of the file, or n lines from the end of the file - which might be better for your application that starting from the very beginning.

    Hope this helps

      Excellent suggestions. If I use the reset_tail function and I am keeping track of the line number(s) that are being read, is there a way I can make it jump straight to a specific line in case of a restart?
        Not currently, no.
      I'm running into the same issue, something causing resets. I have set reset_tail to 0, but the file pointer is still reset to the beginning of the file, which has very bad effects in my application. I'll try setting maxinterval and adjustafter to see if it helps. I'd be happy to help debug this. I think I can reproduce the bug about once/day.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://662916]
Approved by moritz
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others meditating upon the Monastery: (7)
As of 2014-12-20 04:11 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (95 votes), past polls