|Perl: the Markov chain saw|
Re: Pulling by regex IIby PhiRatE (Monk)
|on Dec 13, 2002 at 11:11 UTC||Need Help??|
Right. Lets get down to it. You (and I say you in a friendly, you're-not-the-only-one-everyone-does-this way) have a problem of bad priorities, so, lets list the priorities you have stated so far:
Now, a few of these don't match. Even more don't add up when you add priorities that you *should* have, notably "secure".
So, lets detail some of the issues we have with the code thus far:
Under these circumstances, without access to help, you should be getting your priorities in order. While it depends on your particular position, the following is a good start:
What does this mean? Well, the first thing it means is that if a well known, already-debugged module is around that can do any part of the problem you're trying to solve, use it. Performance is waaay down on the list, just use it. If it really turns out too slow in use, *then* you worry about it.
The second thing it means is a quick investigation of language features that can help you out with debugging and security. Taint mode, Strict and Warnings should all be on, and you shouldn't move an inch until your code is working with all of them.
Now, if you had done this, most of the code above, that isn't working, wouldn't even exist. You'd have something a fraction of the size, that worked fine. Maybe it'd be a bit slow, maybe not, but if it was, you'd be posting to perlmonks saying "Hey, here's my working code, how can I make it go a bit faster", and then people like Abigail-II would have been able to tell you how exactly a database would aid your cause, even when the logs are in text file format. But we're not there yet.
This is by no means perfect, I didn't have enough log examples to work from to be sure the regex will work well, but it contains all the properties of a good first effort. All the necessary warnings and taints turned on, the relevant modules used (there is, I think, an apache log parsing module too, but I wasn't sure you were using apache), all user input is verified and no more code than strictly necessary is used.
The end result...no weird month name/number conversions, about 30 seconds of debugging, and a much better product. I highly recommend, prior to your next project, sitting down and listing your priorities as above, it will help focus you on what really matters first, and what can be tweaked second.