Before I start, quick confession. learning perl has been on my todo list for years. I can laboriously read through a perl script without getting too lost.. But I wouldn't even be a Friar of perl yet.
I was hoping to use this task as a way to make me learn some perl, but its apparently too far beyond me. I almost have a working solution using shell and awk, but by all reports, this is the type of problem that perl can blow those things out of the water on, and I'd really like to gain some understanding of perl, while also hopefully getting the solution to this mountain of data I have to convert.
The problem in (I was going to say brief.. but it seems kinda long now that I've typed it out).
I have been handed about a years worth of logs collected every hour from a server.
This is the base iteration of one of the runs (it runs every 5 minutes
+ every hour)
2350
id pool type rid rset min max s
+ize used load
5 SUNWtmp_serverxd1z1 pset 1 SUNWtmp_serverxd1z1 10
+4 104 104 0.00 6.25
4 SUNWtmp_serverxd1z2 pset 2 SUNWtmp_serverxd1z2 1
+6 16 16 0.00 0.91
0 pool_default pset -1 pset_default 24 66K
+ 24 0.00 1.74
id pool type rid rset min max s
+ize used load
5 SUNWtmp_serverxd1z1 pset 1 SUNWtmp_serverxd1z1 10
+4 104 104 5.01 6.21
4 SUNWtmp_serverxd1z2 pset 2 SUNWtmp_serverxd1z2 1
+6 16 16 0.97 0.91
0 pool_default pset -1 pset_default 24 66K
+ 24 3.73 1.78
> output truncated, but it goes on for 50 lines from the prior timest
+amp, until the next one.
Each run is 50 lines long (they all get combined into a file that is around 14400 lines for each day, with the field in the front of each line being the date derived from the file name.
Here is what they want it to look like. Field position in terms of white space doesn't seem to matter, just relative field position, including the new field "int" which is shown iterating to 2, but would actually only iterate once every 50 lines (the complete data collection run), and then start back at 01.
date hhmm int id pool type rid rset
+ min max size used load
20121105 2350 01 5 SUNWtmp_serverxd1z1 pset 1 SUNWtmp_serv
+erxd1z1 104 104 104 0.00 6.25
20121105 2350 01 4 SUNWtmp_serverxd1z2 pset 2 SUNWtmp_serv
+erxd1z2 16 16 16 0.00 0.91
20121105 2350 01 0 pool_default pset -1 pset_default
+ 24 66K 24 0.00 1.74
date hhmm int id pool type rid rset
+ min max size used load
20121105 2350 02 5 SUNWtmp_serverxd1z1 pset 1 SUNWtmp_serv
+erxd1z1 104 104 104 5.01 6.21
20121105 2350 02 4 SUNWtmp_serverxd1z2 pset 2 SUNWtmp_serv
+erxd1z2 16 16 16 0.97 0.91
20121105 2350 02 0 pool_default pset -1 pset_default
+ 24 66K 24 3.73 1.78
I've tried a few sed and awk one liners, but come to the sad realization that not only are they not all that good for this kind of scenario, I really want to see how this can be done in perl. I've never had to manipulate text in any way that was more complex than a 1 liner could handle, and at this point I see this file needing something more complex than my one liners, but perhaps not more complex than a perl monks one liner.
The text in the date column is derived from the file name coming in. 20121003-poolstat_a_serverd1z0.txt The time is the 4 digit numeric every 50 lines.
The int field needs to iterate each time the poolstat is run. Se below for details.
In summary, the only fields that need to be changed in the mostly numeric lines:
field 1, the 8 digit date, derived from filename IE: 20121003-poolstat_a_serverd1z0.txt
field 2 the 4 digit time that is inside the file every 50th line.
field 3 the iteration count, as follows:
Based on digits 3 and 4 of the 4 digit time.
00-05-10-15-20-25-30-35-40-45-50-55 minute of run.
01-02-03-04-05-06-07-08-09-10-11-12 iteration.
The rest is just printing out existing fields, its getting those onto a line, then awk ( or other) command to print out the other 10 fields, all while keeping track of the current iteration.
And just to keep things complex, the fields in the mostly alpha header line also need 3 new fields:
"date hhmm int"
the rest of the fields are headers supplied by poolstat, but need to somehow be appended to the "date hhmm int" string, without something wierd happening.