Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

Daily Log Audit (regex issue)

by blacksmith (Hermit)
on Oct 21, 2002 at 23:09 UTC ( #206952=perlquestion: print w/replies, xml ) Need Help??

blacksmith has asked for the wisdom of the Perl Monks concerning the following question:

First off, I hope these readmore tags work. Well, I am sure they will work. I hope the formatin of my log will pass into the readmore tags correctly. If not, "Hello, Nodereaper. Hope you are doing well today."

I am having a time trying to come up with the best solution for an auditing issue. I am needing to parse through a list log created by a VSE system. It is in the readmore tags at the bottom.

I am needing to create a script that will parse through this log and let me know of all the jobs that completed with or without satisfactory return codes or abends along with times. I have considered a list of jobs in a text file that can be accessed. I have also considered the possibilty of using <STDIN> user input to search the log and return a file containing the start of the job through the end of the job, showing only the console messages in that partition.

I know that this is possible with Perl, but like most Perl hackers, I want efficiency. I have considered using a pattern match for the start of a particular job and then use "sysread" and "seek" to parse out the console messages for that particular partition. (For anyone not familiar with partitions and console messages on a VSE system. The partition is labeled by the "BG", "P1", "Y1", or just about any two alphanumeric characters right after the message number or count.)

The only issue is that I will need to continue my search on the next line following the pattern match for any other jobs that are listed in my search. For example, if I know that the jobs "TPDELDFN" and "TPCOPY" both run on the 30th of Sept, my search would pick up "TPDELDFN" and begin searching and printing all lines containing the partition "P1" until an "EOJ" pattern match, but it would miss the "TPCOPY" job running in partition "BG" if the search would not continue back at line 631.

I have also thought about separating all the different partitions and then searching them individually since only one job at a time can run in a partition, but of course this seems like a time consuming process and very difficult since the number of partitions can vary per day and not all jobs run in static partition. I have also ruled out the possibility of searching this entire log, which is usually close to 3,000 KB and approximately 20,000+ lines, for each individual job due to time constraints.

Any ideas or directions on where and how I could go about this would be appreciated. Thanks. blacksmith

Sample data follows:
612 F1 0001 1Q34I RDR WAITING FOR WORK ON 00C + 11:18:10 09/30/2002 613 BG 0001 1Q47I BG AR400C 38157 FROM LOCAL U='LICKE ' +, TIME=11:18:10 11:18:10 09/30/2002 614 BG 0000 // JOB AR400C MOVE A PROGRAM TO PRODUCTION LIBRARY + 11:18:10 09/30/2002 DATE 09/30/2002, CLOCK 11/18/10 + 11:18:10 09/30/2002 615 BG 0000 EOJ AR400C MAX.RETURN CODE=0000 + 11:18:11 09/30/2002 DATE 09/30/2002, CLOCK 11/18/11, DURATION 00/00/01 + 11:18:11 09/30/2002 616 BG 0001 1Q34I BG WAITING FOR WORK + 11:18:11 09/30/2002 617 BG 0001 1Q47I BG TPCOPY 38159 FROM (02TP) U='TPASCHAL +', 11:18:57 09/30/2002 TIME=11:18:57 + 11:18:57 09/30/2002 618 BG 0000 // JOB TPCOPY + 11:18:57 09/30/2002 DATE 09/30/2002, CLOCK 11/18/57 + 11:18:57 09/30/2002 619 BG 0000 * STEP SORT COPY DISK + 11:18:57 09/30/2002 620 BG-0000 // PAUSE PLEASE MOUNT THE 'CONCORD.DATA' TAPE (09/11/02 +) 11:18:57 09/30/2002 621 0 + 11:19:03 09/30/2002 98JG 622 BG 0000 ILU323I J1 TPCOPY COPY COMPLETE, IN 94, OUT 94 + 11:19:06 09/30/2002 623 BG 0000 EOJ TPCOPY MAX.RETURN CODE=0000 + 11:19:07 09/30/2002 DATE 09/30/2002, CLOCK 11/19/07, DURATION 00/00/10 + 11:19:07 09/30/2002 624 BG 0001 1Q34I BG WAITING FOR WORK + 11:19:07 09/30/2002 625 MTC RUN,381 + 11:19:22 09/30/2002 98JG 626 AR 0015 1I40I READY + 11:19:22 09/30/2002 98JG 627 P1 0052 1T20I SYS006 HAS BEEN ASSIGNED TO X'FEE' (PERM) + 11:20:32 09/30/2002 628 P1 0052 1T20I SYS008 HAS BEEN ASSIGNED TO X'FED' (PERM) + 11:20:32 09/30/2002 629 P1 0001 1Q47I P1 TPDELDFN 38160 FROM (02TP) U='TPASCH + ', 11:20:32 09/30/2002 TIME=11:20:32 + 11:20:32 09/30/2002 630 P1 0052 // JOB TPDELDFN + 11:20:32 09/30/2002 DATE 09/30/2002, CLOCK 11/20/32 + 11:20:32 09/30/2002 631 P1 0052 EP017 EPIC SYSTEM IS ACTIVE IN THIS PARTITION + 11:20:33 09/30/2002 632 P1 0052 1T20I SYS006 HAS BEEN ASSIGNED TO X'FEE' (PERM) + 11:20:35 09/30/2002 633 P1 0001 1Q3EI DYNAMIC CLASS 'P' WAITING FOR WORK + 11:20:35 09/30/2002 634 F4 0163 TVCE64022I - TASK WAS FORCED - TASK NO. 8782 + 11:21:22 09/30/2002 635 F4 0163 DFHFC0300 PRODCICS (FP40 T30B) PURGE DEFERRED DUE TO IN +COMPLETE I/O 11:21:22 09/30/2002 OPERATION ON VSAM FILE 'FPHIST '. + 11:21:22 09/30/2002 636 F4 0163 DFHFC0305 PRODCICS MESSAGE DFHFC0300 FILE 'FPHIST ' DS +NAME 'FP. 11:21:22 09/30/2002 OPTIONS SPECIFIED: NEW + 10/01/2002 PAGE 19 COUNT MESSAGE TEXT + CLOCK DATE CONSOLE MEMBER.HISTORY'. + 11:21:22 09/30/2002 637 BG 0001 1Q47I BG TPCOPY 38161 FROM (02TP) U='TPASCHAL +', 11:22:24 09/30/2002 TIME=11:22:24 + 11:22:24 09/30/2002 638 BG 0000 // JOB TPCOPY + 11:22:24 09/30/2002 DATE 09/30/2002, CLOCK 11/22/24 + 11:22:24 09/30/2002 639 BG 0000 * STEP SORT COPY DISK + 11:22:24 09/30/2002 640 BG-0000 // PAUSE PLEASE MOUNT THE 'CONCORD.DATA' TAPE (09/12/02 +) 11:22:24 09/30/2002 641 0 + 11:24:42 09/30/2002 98JG 642 BG 0000 ILU323I J1 TPCOPY COPY COMPLETE, IN 94, OUT 94 + 11:24:48 09/30/2002 643 BG 0000 EOJ TPCOPY MAX.RETURN CODE=0000 + 11:24:48 09/30/2002 DATE 09/30/2002, CLOCK 11/24/48, DURATION 00/02/23 + 11:24:48 09/30/2002 644 BG 0001 1Q34I BG WAITING FOR WORK + 11:24:48 09/30/2002 645 F4 0163 TVCE64022I - TASK WAS FORCED - TASK NO. 12528 + 11:25:32 09/30/2002 646 F4 0163 DFHFC0300 PRODCICS (FP40 T30B) PURGE DEFERRED DUE TO IN +COMPLETE I/O 11:25:32 09/30/2002 OPERATION ON VSAM FILE 'FPHIST '. + 11:25:32 09/30/2002 647 F4 0163 DFHFC0305 PRODCICS MESSAGE DFHFC0300 FILE 'FPHIST ' DS +NAME 'FP. 11:25:32 09/30/2002 MEMBER.HISTORY'. + 11:25:32 09/30/2002 648 Y1 0052 EOJ TPDELDFN MAX.RETURN CODE=0008 + 11:25:42 09/30/2002 DATE 09/30/2002, CLOCK 11/25/42, DURATION 00/05/10 + 11:25:42 09/30/2002 649 Y1 0052 1T20I SYS008 HAS BEEN ASSIGNED TO X'FED' (PERM) + 11:25:42 09/30/2002 650 Y1 0001 1Q47I Y1 DITE02KL 38163 FROM (02KL) , TIME=11:25:42 + 11:25:42 09/30/2002 651 Y1 0052 // JOB DITE02KL DITTO/FS + 11:25:42 09/30/2002 DATE 09/30/2002, CLOCK 11/25/42 + 11:25:42 09/30/2002 652 Y1 0052 EP017 EPIC SYSTEM IS ACTIVE IN THIS PARTITION + 11:26:00 09/30/2002 653 Y1 0052 EOJ DITE02KL MAX.RETURN CODE=0000 + 11:26:13 09/30/2002 DATE 09/30/2002, CLOCK 11/26/13, DURATION 00/00/31 + 11:26:13 09/30/2002 654 Y1 0001 1Q3EI DYNAMIC CLASS 'Y' WAITING FOR WORK + 11:26:14 09/30/2002 655 X2 0171 APT933I PSFDEED PSFX2 DEED3 PRINTER INFORMATION, DEV=43 +32, ADDR=10.53. 11:27:00 09/30/2002 24.32, CPUID=FF01561A20660000, ATTACH=PSF + 11:27:00 09/30/2002 656 F1 0001 1QY2I DEVICE DEED3 WAITING FOR WORK, DDS=SYSPSFX2 + 11:27:08 09/30/2002 657 F1 0001 1QY2I DEVICE DEED3 WAITING FOR WORK, DDS=SYSPSFX2 + 11:27:19 09/30/2002 658 Y1 0052 1T20I SYS006 HAS BEEN ASSIGNED TO X'FEE' (PERM) + 11:27:29 09/30/2002

Edited: ~Mon Oct 21 23:29:58 2002 (GMT) by footpad: Added <CODE> tags for legibility, per Consideration

Replies are listed 'Best First'.
Re: Daily Log Audit (regex issue)
by jlongino (Parson) on Oct 22, 2002 at 04:20 UTC
    I would consider a brute force method to begin with and then refactor from there. Here's where I'd start:

    First, you need to decide how to start the processing:

    • Begin at a point where all partitions are quiet?
    • Accept the fact that some jobs may have already started when you begin and either ignore them or check the RC and report accordingly (don't worry about start information)?

    I'd parse the log file line by line and use a hash to keep the necessary info for the jobs in progress:

    $jobs{$partition}{jobname} $jobs{$partition}{starttime} $jobs{$partition}{endtime} $jobs{$partition}{rc}

    You examine each line and determine:

    • Is this the beginning of a new job? If so, store jobname, starttime.
    • Is this the end of a job? If so, have I recorded the start info for this job? if so, store endtime, rc, report, delete entry.
    • Ignore if neither of the above is true

    Determine your strategy for what happens when you reach EOF:

    • Ignore the jobs started but not finished (or at least report that fact with the data you have)?
    • Choose an ending point in the log file where the partitions are all quiet?

    The regexen used to parse each line should be fairly straight forward, you only have to read the file once, and you keep storage to a minimum. Would this be the most efficient way to do it? I don't know but as I said, during the coding process, you may gain some insight to try something different.

    --Jim

Re: Daily Log Audit (regex issue)
by BrowserUk (Pope) on Oct 22, 2002 at 07:30 UTC

    I set out to try something a bit different and ended up doing pretty much what jlongino described above. Nothing very clever, except maybe that as (mostly) your data appears to be fixed-format records. You can greatly simplify your regexes by first chopping the lines up using unpack.

    The following code uses (simplified) mocked-up data and will need some considerable work, but it does show the basic technique, and may be of some use as a starting point.

    #! perl -sw use strict; my %info; while (<DATA>) { my ($partn, $msg, $timestamp ) = unpack 'x4 A2 x6 A63 A19', $_; $info{$partn}{started}= $timestamp, $info{$partn}{job} = $1 if $msg =~ m!^// JOB (.{8})!; print 'Job ', $info{$partn}{job}, ' started in partition ', $partn, ' at ', $info{$partn}{started}, $/, ' and finished at ', $timestamp, ' with a return code of ', $1, $/ if $msg =~ m!^EOJ.+?MAX.RETURN CODE=(\d{4})! ; } =pod #Output c:\test>206952 Job NEQZQCPB started in partition P1 at 07:56:18 10/22/200 and finished at 08:21:51 10/22/200 with a return code of 0001 Job HGPKIOEV started in partition ZK at 08:58:19 10/22/200 and finished at 07:56:05 10/22/200 with a return code of 0002 Job SPZDIAGL started in partition Q1 at 08:39:12 10/22/200 and finished at 09:00:33 10/22/200 with a return code of 0003 Job QMGPMPQK started in partition BG at 08:30:41 10/22/200 and finished at 08:01:53 10/22/200 with a return code of 0004 Job DZCWQXCS started in partition BG at 08:42:35 10/22/200 and finished at 08:05:12 10/22/200 with a return code of 0004 Job EDQYPSEG started in partition AX at 09:06:46 10/22/200 and finished at 08:22:15 10/22/200 with a return code of 0001 Job BXVWTNMB started in partition Q1 at 07:58:45 10/22/200 and finished at 07:55:04 10/22/200 with a return code of 0003 Job ZFWYJVRT started in partition BG at 08:43:40 10/22/200 and finished at 09:04:00 10/22/200 with a return code of 0000 Job SSJYBRLY started in partition AX at 08:08:56 10/22/200 and finished at 07:57:29 10/22/200 with a return code of 0001 c:\test> =cut __DATA__ 000 Q1 0897 // JOB SPZDIAGL + 08:39:12 10/22/2002 001 Q1 0652 MSG001 SPZDIAGL + 08:26:24 10/22/2002 002 ZK 0229 // JOB HGPKIOEV + 08:58:19 10/22/2002 003 AX 0108 // JOB EDQYPSEG + 09:06:46 10/22/2002 004 ZK 0735 MSG001 HGPKIOEV + 08:57:16 10/22/2002 005 AX 0218 MSG001 EDQYPSEG + 08:42:01 10/22/2002 006 P1 0345 // JOB NEQZQCPB + 07:56:18 10/22/2002 007 P1 0735 EOJ NEQZQCPB MAX.RETURN CODE=0001 + 08:21:51 10/22/2002 008 ZK 0973 MSG002 HGPKIOEV + 07:56:08 10/22/2002 009 Q1 0317 MSG002 SPZDIAGL + 08:47:36 10/22/2002 010 ZK 0945 EOJ HGPKIOEV MAX.RETURN CODE=0002 + 07:56:05 10/22/2002 011 Q1 0349 MSG003 SPZDIAGL + 08:02:11 10/22/2002 012 BG 0903 // JOB QMGPMPQK + 08:30:41 10/22/2002 013 AX 0872 MSG002 EDQYPSEG + 07:55:13 10/22/2002 014 AX 0931 MSG003 EDQYPSEG + 08:09:46 10/22/2002 015 Q1 0697 EOJ SPZDIAGL MAX.RETURN CODE=0003 + 09:00:33 10/22/2002 016 Q1 0470 // JOB BXVWTNMB + 07:58:45 10/22/2002 017 BG 0856 MSG001 QMGPMPQK + 08:18:55 10/22/2002 018 BG 0625 MSG002 QMGPMPQK + 07:57:50 10/22/2002 019 BG 0753 MSG003 QMGPMPQK + 09:08:36 10/22/2002 020 BG 0188 MSG004 QMGPMPQK + 08:36:16 10/22/2002 021 BG 0639 EOJ QMGPMPQK MAX.RETURN CODE=0004 + 08:01:53 10/22/2002 022 BG 0766 // JOB DZCWQXCS + 08:42:35 10/22/2002 023 AX 0729 MSG004 EDQYPSEG + 08:37:13 10/22/2002 024 BG 0509 MSG001 DZCWQXCS + 08:17:23 10/22/2002 025 BG 0276 MSG002 DZCWQXCS + 07:55:58 10/22/2002 026 Q1 0581 MSG001 BXVWTNMB + 08:30:49 10/22/2002 027 BG 0553 MSG003 DZCWQXCS + 09:09:26 10/22/2002 028 BG 0473 EOJ DZCWQXCS MAX.RETURN CODE=0004 + 08:05:12 10/22/2002 029 AX 0925 EOJ EDQYPSEG MAX.RETURN CODE=0001 + 08:22:15 10/22/2002 030 Q1 0985 MSG002 BXVWTNMB + 08:49:00 10/22/2002 031 Q1 0292 EOJ BXVWTNMB MAX.RETURN CODE=0003 + 07:55:04 10/22/2002 032 BG 0004 // JOB ZFWYJVRT + 08:43:40 10/22/2002 033 BG 0136 MSG001 ZFWYJVRT + 09:13:00 10/22/2002 034 AX 0547 // JOB SSJYBRLY + 08:08:56 10/22/2002 035 BG 0476 MSG002 ZFWYJVRT + 08:28:15 10/22/2002 036 AX 0671 MSG001 SSJYBRLY + 07:56:40 10/22/2002 037 BG 0207 MSG003 ZFWYJVRT + 08:10:07 10/22/2002 038 BG 0275 EOJ ZFWYJVRT MAX.RETURN CODE=0000 + 09:04:00 10/22/2002 039 AX 0808 EOJ SSJYBRLY MAX.RETURN CODE=0001 + 07:57:29 10/22/2002

    Cor! Like yer ring! ... HALO dammit! ... 'Ave it yer way! Hal-lo, Mister la-de-da. ... Like yer ring!

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://206952]
Approved by SparkeyG
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (2)
As of 2021-12-08 10:22 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    R or B?



    Results (34 votes). Check out past polls.

    Notices?