Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

Re^2: Using binary search to get the last 15 minutes of httpd access log

by mcdave (Beadle)
on Aug 03, 2012 at 21:19 UTC ( [id://985355]=note: print w/replies, xml ) Need Help??


in reply to Re: Using binary search to get the last 15 minutes of httpd access log
in thread Using binary search to get the last 15 minutes of httpd access log

Very minor quibble, and not about Perl. But Apache creates the the timestamp from when the request is received yet writes to the file when the request is served. So, if you've got requests that take a long time to return, or lots of requests that take a very short time, you logs won't be ordered by time.

So, if it's 12:00, and A is a long-running request that started at 11:44 and B is a short-running request, the order of events could be

11:44:00... A received 11:45:10... B receievd 11:45:11... B served 11:46:00... A served
and the log will read
11:45:11 B 11:44:00 A
in that order, so your backward search would stop at A.

If "last 15 minutes" means "approximately last 15 minutes" and your server is relatively zippy, you're fine. I have sometimes needed 15 minutes to mean "exactly 15 minutes", though, so I happen to know this trivial about Apache.

Replies are listed 'Best First'.
Re^3: Using binary search to get the last 15 minutes of httpd access log
by davido (Cardinal) on Aug 03, 2012 at 23:21 UTC

    That being the case there's virtually no way to assure a slow request isn't lurking in the past without scanning back 15 minutes plus some known timeout interval. Still reading backward, plus a timeout interval is probably more reliable than doing a binary search on a file that may not be in correct order. Binary searches aren't too good with fuzzy sortedness. ;)


    Dave

Re^3: Using binary search to get the last 15 minutes of httpd access log
by mhearse (Chaplain) on Aug 03, 2012 at 21:57 UTC
    That is a valid point which I considered too. But this script will run on a loghost. And if I remember correctly... the timestamp will be applied to the messages as they arrive to the loghost.
      Here is the code I ended up using to search the ordered apache logs for a specific time block.
      #!/usr/bin/env perl use strict; use Search::Dict; open my $fh, "/var/log/http/access_log"; my $start = 'Oct 2 10:21:'; my $end = 'Oct 2 10:2[1-2]:'; look $fh, $start; while (my $line = <$fh>) { last if ($line !~ /$end/); print $line; }
      Also, looks like I should have done a more thorough search

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://985355]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others pondering the Monastery: (2)
As of 2024-04-26 02:54 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found