Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

Re: Question about the most efficient way to read Apache log files without All-In-One Modules from CPAN (personal learning exercise)

by kcott (Chancellor)
on Jun 17, 2015 at 07:19 UTC ( #1130766=note: print w/replies, xml ) Need Help??


in reply to Question about the most efficient way to read Apache log files without All-In-One Modules from CPAN (personal learning exercise)

G'day lulz,

Welcome to the Monastery.

Reading an entire logfile into memory prior to processing would be very much the exception; the norm would be to process the file a line at a time.

The format of each log entry is defined in the Apache configuration file (httpd.conf or whatever you've called it). From my httpd.conf, here's the lines that describe the access_log:

LogFormat "%h %l %u %t \"%r\" %>s %b" common ... CustomLog "/private/var/log/apache2/access_log" common

See the documentation in Apache Module mod_log_config for a description of the %X codes and other related information.

With that information to hand, it's fairly easy to construct a regex to parse the log records. Here's a script to do that. The three DATA lines are taken verbatim from my access_log file.

#!/usr/bin/env perl use strict; use warnings; # LogFormat "%h %l %u %t \"%r\" %>s %b" common my $re = qr{ ^ ( \S+ ) # capture remote host (%h) \s+ ( \S+ ) # capture remote logname (%l) \s+ ( \S+ ) # capture remote user (%u) \s+ \[ ( [^\]]+ ) # capture request time (%t) without br +ackets \] \s+ " ( (?: [^"\\]++ | \\. )*+ ) # capture first line of request (%r) " \s+ ( \d+ ) # capture final status (%>s) \s+ ( \d+ ) # capture response size in bytes (%b) $ }x; my $format = join '', "Host: %s\n", "Logname: %s\n", "User: %s\n", "Time: %s\n", "Request: %s\n", "Status: %d\n", "Size: %d\n\n"; printf $format, /$re/ while <DATA>; __DATA__ 127.0.0.1 - - [22/Apr/2015:13:35:04 +1000] "GET /bin/admin.pl HTTP/1.1 +" 401 509 127.0.0.1 - ken [22/Apr/2015:13:35:21 +1000] "GET /bin/admin.pl HTTP/1 +.1" 500 656 127.0.0.1 - - [24/Apr/2015:04:51:49 +1000] "GET / HTTP/1.1" 200 45

Output:

Host: 127.0.0.1 Logname: - User: - Time: 22/Apr/2015:13:35:04 +1000 Request: GET /bin/admin.pl HTTP/1.1 Status: 401 Size: 509 Host: 127.0.0.1 Logname: - User: ken Time: 22/Apr/2015:13:35:21 +1000 Request: GET /bin/admin.pl HTTP/1.1 Status: 500 Size: 656 Host: 127.0.0.1 Logname: - User: - Time: 24/Apr/2015:04:51:49 +1000 Request: GET / HTTP/1.1 Status: 200 Size: 45

Be aware that your configuration may use other logfiles with different LogFormat directives; however, you should be able to contruct a suitable regex using the script above as a template. And, of course, you'll probably want to do something more useful than just printing the data.

-- Ken

  • Comment on Re: Question about the most efficient way to read Apache log files without All-In-One Modules from CPAN (personal learning exercise)
  • Select or Download Code

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1130766]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others avoiding work at the Monastery: (2)
As of 2018-08-15 22:23 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Asked to put a square peg in a round hole, I would:









    Results (165 votes). Check out past polls.

    Notices?