quick glance of the source
You didnt look close enough.
This regex: q{([^\s]*)\s+([^\s]*)\s+([^\s]*)\s+\[(([^: ]+):([^ ]+) ([-+0-9]+))\]\s
++"(([^\s]+) ([^\s]+)( ([^\s"]*))?)"\s+([^\s]*)\s+([^\s]*)};
Won't match "-", because it expects and requires at least two space delimited fields within the quotes; and allows for a third.
Note also that both ID fields are expected to match [^\s]* (I guess he's not aware of \S; and it should at least be + not *; which could be an indication of his perl experience.).
So, a "proper parser" would break. Maybe it has a back-up plan for if the regex fails; but equally, it's simple to code a back up plan for the white space split also.
So let's review:
- The OP posted asked about using pack & unpack, and a couple of early responders posted, with positive sounding confirmations.
- I countered by informing him that pack & unpack were completely inappropriate for the task; and suggested split as a starting point in his "personal learning experience".
- You pop up and rather than trying to help the op; you attempt to pick holes in my post; despite that its purpose was to save the OP wasting time with pack & unpack.
- So, I reminded you: "He did ask for a learning exercise; not a pre-solved solution.".
- So you come back with this guess: "(or if Apache really does go to some pains to make sure spaces never show up in the various log fields -- say by always representing them as + or %20 -- then yay, but I'm not sure this is actually true.)".
Which is demonstrably wrong!
- You retort with: "which says nothing about logname and user,".
Look at the regex above! Wrong again.
- And "nor does it guarantee that the HTTP command field always consists of exactly 3 space-separated components ".
Also wrong!
- So then you throw "10.54.33.35 - - [18/Jun/2015:09:05:55 -0700] "-" 408 0" into the mix.
And, as I've shown above, that would (without special handling) break most pre-solved solutions; which I'll remind you: the OP explicitly didn't want.
And which could just as easily be handled by a special case with the split version.
You know, as a part of the personal learning experience!
A big part of which might be that having tried it for himself; he'd decides to opt for a pre-solved solution.
Or he might decide to write his own CPAN module that does it better than any of the existing ones.
That's his choice.
All I did was short circuit his learning, by informing him that pack & unpack were definitely the wrong tools to start with.
So, here we are 13 levels deep; and you've become boring. No attempt to help the OP; just banging on about stuff it seems you barely understand.
So, I'm bored and done. T'was fun.
Update: I forgot this little gem. You offered this wishy-washy suggestion "or using Text::CSV or somesuch"; but then later suggest that split will break because "which says nothing about logname and user,"; completely oblivious to the fact that if either ID contained spaces; it would break that module also!
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.
|