more useful options | |
PerlMonks |
comment on |
( [id://3333]=superdoc: print w/replies, xml ) | Need Help?? |
By this point, you'd think the topic of web server logfile parsing would be completely mined out, with all of the rough edges filed off, if not pounded completely flat. Something that showed up recently in my logfiles, coupled with what I've seen recently
in print
suggests that the topic still has some unmapped pitfalls.
Here's the familiar drill: To parse a "common log format" file (assuming you're doing it yourself), the conventional wisdom says to write: Yawn. Been there, done that, right? Maybe not. Let's take a second look at $auth_user. Unless you're using basic authentication to password protect pages, you'll see this in your logs as '-'. No problem there. And if you are using basic authentication, you'll see a username. No problem there... unless the username cannot contain whitespace, at which point the regexp fails to match. And since there's no check to see if it fails... But can a username contain whitespace? Let's see. D'oh! RFC1945 says you aren't supposed be able to do this! (Update: RFC1945 is obsolete. RPF2617 suggests that embedded spaces are OK. Hm...) The simple solution would seem to be "So, don't do that!", but here's where things get stranger. I've recently seen a case where somebody apparently presented an Authentication: header to a non-protected resource on my site, resulting in a bogus name appearing in the logs. I say "apparently" because I've been able to duplicate the behavior, and I can't think of any other way for the bogus username to have appeared. A minor annoyance, or the basis for a crude denial-of-accurate-service attack against log analysis software. Fortunately, the solution is straightforward. All you have to do is change /^(\S+) (\S+) (\S+) \[ ... to /^(\S+) (\S+) (.+) \[ ... This makes the regex much less efficient, since it's going to backtrack to match '[', but it will resolve the problem, even if someone forges the username "dws [".
In reply to A rare, insidious logfile parsing pitfall by dws
|
|