http://www.perlmonks.org?node_id=861678

pileofrogs has asked for the wisdom of the Perl Monks concerning the following question:

Ahoy, ye Monks.

I've recently had a few needs to take a text file full of data, usually a server log file, and analyse it in some way. It's pretty easy, if I know what I'm looking for, to write a script to tell me, say, how many times page X was loaded during the month of July. What's less obvious is what to do with the data when I don't know what I'm looking for yet. I'm looking for patterns, but I don't know what they are.

Right now, I've got two theoretically identical DHCP servers, except one of them is getting 1/2 the traffic of the other, which doesn't make sense. I want to analyse my logs and see if I can figure out a pattern. Maybe the one with 1/2 the traffic is getting no requests from computers in a particular subnet? Maybe it's only getting a certain type of request? What time of day has the most requests?

Basically, I'm trying to figure out what form to put my data into in order to ask any question I want.

I'm thinking the best way to handle this is to load all the data into a SQL DB and then run SQL queries at it to ask it the questions I come up with.

So, the question I'm really trying to get to is: what's a good strategy when you know you want to analyze some data, but you don't know specifically what you're going to look for? If I'm right that the first step should involve stuffing the data into a SQL database, are there genetic modules to help me do this? Or am I totally missing the boat and there's better ways to handle this. Or maybe I'm trying to be too sophisticated and the most efficient thing to do is change the code to ask and answer a different question each time?

I hope that made some sense....

--Pileofrogs