Hello,
I'm trying to dispatch some contents from an apache log depending on a regexp and i have a problem.
It seems strings which don't match my regexp goes in the matched area ...
example command-line:
$ cat access_log | perl -pe 'if (s/^(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3
+}).*"(GET|POST|HEAD) (.*?) HTTP\/.*$/$1,$3/) { print } else { print S
+TDERR }' 2>out.err > out.csv
results in STDIN
[...]
72.30.161.243,/
72.30.161.243,/
125.224.206.168 - - [04/Oct/2009:00:13:42 +0200] "-" 408 - "-" "-"
125.224.206.168 - - [04/Oct/2009:00:13:42 +0200] "-" 408 -
125.224.206.168 - - [04/Oct/2009:00:13:47 +0200] "CONNECT 203.188.201.
+253:25 HTTP/1.1" 404 516 "-" "-"
125.224.206.168 - - [04/Oct/2009:00:13:47 +0200] "CONNECT 203.188.201.
+253:25 HTTP/1.1" 404 516
96.243.255.188,//phpMyAdmin/
[...]
STDERR has only valid contents (not matching regexp)
corresponding part of the source file:
72.30.161.243 - - [03/Oct/2009:17:21:43 +0200] "GET / HTTP/1.0" 404 51
+6 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp/3.0; http://help.yahoo.c
+o
m/help/us/ysearch/slurp)"
72.30.161.243 - - [03/Oct/2009:17:21:43 +0200] "GET / HTTP/1.0" 404 51
+6
125.224.206.168 - - [04/Oct/2009:00:13:42 +0200] "-" 408 - "-" "-"
125.224.206.168 - - [04/Oct/2009:00:13:42 +0200] "-" 408 -
125.224.206.168 - - [04/Oct/2009:00:13:47 +0200] "CONNECT 203.188.201.
+253:25 HTTP/1.1" 404 516 "-" "-"
125.224.206.168 - - [04/Oct/2009:00:13:47 +0200] "CONNECT 203.188.201.
+253:25 HTTP/1.1" 404 516
96.243.255.188 - - [04/Oct/2009:00:26:17 +0200] "GET //phpMyAdmin/ HTT
+P/1.1" 404 516 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows 98)"
96.243.255.188 - - [04/Oct/2009:00:26:17 +0200] "GET //phpMyAdmin/ HTT
+P/1.1" 404 516
Has someone encounters a similar bug ? or is it my regexp ? seems hard to believe that it matched the CONNECT line ...
Normally, out.csv must contains only csv lines.
thanks
Best regards