Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

Re^7: Question about the most efficient way to read Apache log files without All-In-One Modules from CPAN (personal learning exercise)

by BrowserUk (Patriarch)
on Jun 17, 2015 at 23:32 UTC ( [id://1130905]=note: print w/replies, xml ) Need Help??


in reply to Re^6: Question about the most efficient way to read Apache log files without All-In-One Modules from CPAN (personal learning exercise)
in thread Question about the most efficient way to read Apache log files without All-In-One Modules from CPAN (personal learning exercise)

then yay, but I'm not sure this is actually true

Read the spec.


With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority". I'm with torvalds on this
In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked
  • Comment on Re^7: Question about the most efficient way to read Apache log files without All-In-One Modules from CPAN (personal learning exercise)
  • Watch for: Direct replies / Any replies

Replies are listed 'Best First'.
Re^8: Question about the most efficient way to read Apache log files without All-In-One Modules from CPAN (personal learning exercise)
by wrog (Friar) on Jun 18, 2015 at 16:17 UTC
    • which says nothing about logname and user,
    • nor does it guarantee that the HTTP command field always consists of exactly 3 space-separated components (hint: it doesn't).
      "...3 space-separated components (hint: it doesn't)"

      I used this: <LogFormat "%h %l %u %t \"%r\" %>s %b" common>, as in the example by kcott.

      From the docs:

      "First, the method used by the client is GET. Second, the client requested the resource /apache_pb.gif, and third, the client used the protocol HTTP/1.0."

      Hence the request field will always look like this: "GET /karls.beer HTTP/1.0".

      Best regards, Karl

      «The Crux of the Biscuit is the Apostrophe»

        (yay, somebody is finally quoting the right spec)

        Key sentence you apparently skipped over

        The log file entries produced in CLF will look something like this
        emphasis mine. In other words, this was an example. Not all commands are GETs, and even for the ones that are, you'll still have assholes out there who are not following the protocol (keeping in mind that the whole freaking point of a log file is to preserve what's actually happening so that you can, say, diagnose stuff that's going wrong...)
      nor does it guarantee that the HTTP command field always consists of exactly 3 space-separated components (hint: it doesn't).

      You read the wrong spec, or you misread the right one:

      The Request-Line begins: -- with a method token, -- followed by the Request-URI -- and the protocol version, -- and ending with CRLF. The elements are separated by SP characters. No CR or LF is allowed except in the final CRLF sequence. Request-Line = Method SP Request-URI SP HTTP-Version CRLF

      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
      I'm with torvalds on this Agile (and TDD) debunked I told'em LLVM was the way to go. But did they listen!
      div class=
        1. That's the HTTP spec, which is all very nice but is not the Apache Log spec.
        2. Just assuming for the sake of argument that the HTTP command line is indeed being copied verbatim into that field, there's also the question of whether all of the clients out there will be actually following the spec — we live in a world with script kiddies and DDOS hobbyists, after all (hint: I'm guessing there's a reason the Apache folks saw fit to double-quote that field)

        In other news, in my real-live Apache 2.4 webserver using the default-format log, I see lines like this:

        10.54.33.35 - - [18/Jun/2015:09:05:55 -0700] "-" 408 0

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1130905]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others pondering the Monastery: (6)
As of 2024-03-28 14:58 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found