nikos has asked for the wisdom of the Perl Monks concerning the following question:

Hi monks,

Here is the task:
There is a Postfix mail log over 20 megabytes. You should write a perl5 script that will parse it and print results out.

Parsing: The script tracks one particular message in the log file. There are 3 parameters passed to the script: time interval(its' start and end), 'message-id', 'from' or 'to' message parameter's value. The script should find an unique message identifier for the very first message in the log file that suits the passed conditions (time interval and the field's value: message-id, from or to). Then our script prints out all messages from the log file that corresponds to the selected unique MTA message ID in the defined time period. If there are several messages with the same 'from' or 'to' fields that are good for us, then the very first message is selected.

Our script is called like parser.pl START END MESSAGE-ID|ADDRESS
START and END - the time period defined in ISO 8601 restricted time format

ISO 8601 restricted time format
The lead-in character for a restricted ISO 8601 time is an '@'-sign. The particular format of the time in restricted ISO 8601 is: [[[[[cc]yy]mm]dd][T[hh[mm[ss]]]]]. Optional date fields default to the appropriate component of the current date; optional time fields default to midnight; hence if today is January 22, 1999, the following date specifications are all equivalent:
`19990122T000000'
`990122T000000'
`0122T000000'
`22T000000'
`T000000'
`T0000'
`T00'
`22T'
`T'
`'


I'll post my solution in my next post. Any ideas how to improve it or implement it in a different way?
Log file:
Apr 26 00:00:00 edge newsyslog[25697]: logfile turned over Apr 26 00:00:53 hosting postfix/smtpd[26207]: connect from ef.egroups. +com[64.211.240.229] Apr 26 00:00:54 hosting postfix/smtpd[26207]: 86E511176AE: client=ef.e +groups.com[64.211.240.229] Apr 26 00:00:55 hosting postfix/cleanup[23958]: 86E511176AE: message-i +d=<F203JO4rvQPKG1NBSxW0000c981@hotmail.com> Apr 26 00:00:55 hosting postfix/qmgr[22567]: 86E511176AE: from=<anisim +ov@hotmail.com>, size=7002, nrcpt=1 (queue active) Apr 26 00:00:55 hosting postfix/lmtp[25547]: 86E511176AE: to=<porto1@h +osting.agava.ru>, relay=/var/spool/cyrus/run/lmtp[/var/spool/cyrus/ru +n/lmtp], delay=1, status=sent (250 2.1.5 Ok) Apr 26 00:00:55 hosting postfix/smtpd[26207]: disconnect from ef.egrou +ps.com[64.211.240.229] Apr 26 00:01:10 hosting postfix/smtpd[26207]: connect from adsl-20-151 +-106.sdf.bellsouth.net[66.20.151.106] Apr 26 00:01:11 hosting postfix/smtpd[26207]: 45FC11176AE: client=adsl +-20-151-106.sdf.bellsouth.net[66.20.151.106] Apr 26 00:01:11 hosting postfix/smtpd[26207]: reject: RCPT from adsl-2 +0-151-106.sdf.bellsouth.net[66.20.151.106]: 504 <Hinvest>: Helo comma +nd rejected: need fully-qualified hostname; from=<hin_vest_@moscowmai +l.com> to=<webmaster@rc5.agava.ru> Apr 26 00:01:17 hosting postfix/smtpd[26207]: lost connection after RC +PT from adsl-20-151-106.sdf.bellsouth.net[66.20.151.106] Apr 26 00:01:17 hosting postfix/smtpd[26207]: disconnect from adsl-20- +151-106.sdf.bellsouth.net[66.20.151.106] Apr 26 00:05:53 hosting postfix/smtpd[30369]: connect from adsl-20-151 +-106.sdf.bellsouth.net[66.20.151.106] Apr 26 00:05:54 hosting postfix/smtpd[30369]: B4ECD1176AE: client=adsl +-20-151-106.sdf.bellsouth.net[66.20.151.106]

Replies are listed 'Best First'.
Re: Postfix maillog parser
by nikos (Scribe) on Apr 02, 2005 at 00:55 UTC
    #!/usr/bin/perl -w use strict; use Time::Local; if( @ARGV != 3 ) { print "Usage: $0 START END MESSAGE-ID|ADDRESS\n"; print "i.e. $0 \@20050426T000000 \@0426T08 videok4\@hosting.agava. +ru\n"; exit; } my ($start, $end, $m_id)=@ARGV; sub iso2epoch($) { my $str=shift; my ($mday, $mon, $year)=(localtime)[3,4,5]; $year+=1900; my ($hr, $min, $sec)=(0, 0, 0); if( $str =~ /^@(\d*)T(\d*)$/ ) { my ($date, $time)=($1, $2); if( $date =~ /^(\d{2,2})(\d{2,2})(\d{2,2})(\d{2,2})$/ ) { # all elements $year="$1$2"; $mon=$3-1; $mday=$4; } elsif ( $date =~ /^(\d{2,2})(\d{2,2})(\d{2,2})$/ ) { # 3 last elements $mon=$2-1; $mday=$3; my $year_2=$1; $year=~/^(\d{2,2})/; $ +year="$1$year_2"; } elsif ( $date =~ /^(\d{2,2})(\d{2,2})$/ ) { # 2 last elements $mon=$1-1; $mday=$2; } elsif ( $date =~ /^(\d{2,2})$/ ) { # 1 last element $mday=$1; } elsif ( $date eq "" ) { # none } else { # syntax error return undef; } if( $time =~ /^(\d{2,2})(\d{2,2})(\d{2,2})$/ ) { # hrs, min, sec $hr=$1; $min=$2; $sec=$3; } elsif ( $time =~ /^(\d{2,2})(\d{2,2})$/ ) { # hrs and min $hr=$1; $min=$2; } elsif ( $time =~ /^(\d{2,2})$/ ) { # hrs only $hr=$1; } elsif ( $time eq "" ) { # midnight } else { # syntax error return undef; } return timelocal($sec, $min, $hr, $mday, $mon, $year); } elsif ( $str eq "@" ) { # empty return timelocal(0, 0, 0, $mday, $mon, $year); } return undef; } my %mon2mm=qw/Jan 0 Feb 1 Mar 2 Apr 3 May 4 Jun 5 Jul 6 Aug 7 Sep 8 Oc +t 9 Nov 10 Dec 11/; my($month, $day, $time, $hr, $min, $sec, $msg_unixtime, $year); my($host, $proc, @msg, $msg, $qid); $year=(localtime)[5]+1900; $qid=""; $start=iso2epoch($start); $end=iso2epoch($end); if( !defined($start) ) { print "START: $ARGV[0] is not correct\n"; } if( !defined($end) ) { print "END: $ARGV[1] is not correct\n"; } if( !defined($start) || !defined($end) ) { exit; } print "Parsing maillog.log\n"; open(FILE, "<maillog.log") || die "Cannot open maillog.log: $!"; while( <FILE> ) { ($month, $day, $time, $host, $proc, @msg)=split; ($hr, $min, $sec)=split ':', $time; $msg_unixtime=timelocal($sec, $min, $hr, $day, $mon2mm{$month}, $y +ear); last if ( $msg_unixtime > $end ); next if ( $msg_unixtime < $start ); $msg=join ' ', @msg; if( $msg =~ /(message-id|from|to)=<(.*?)>/ ) { if( $2 eq $m_id ) { # message found if( $msg=~/^([A-F0-9]+): / ) { $qid=$1; last; } } } } if( $qid eq "" ) { print "No messages found\nDone\n"; exit; } print "QID found: $qid\n"; seek(FILE, 0, 0); while(<FILE>) { ($month, $day, $time, $host, $proc, @msg)=split; ($hr, $min, $sec)=split ':', $time; $msg_unixtime=timelocal($sec, $min, $hr, $day, $mon2mm{$month} +, $year); last if ( $msg_unixtime > $end ); next if ( $msg_unixtime < $start ); $msg=join ' ', @msg; if( $msg =~ /$qid/ ) { print; } } close(FILE); print "Done\n";
      You should really truncate the log file example to only a couple of lines, as it is a bit annoying to scroll it all down.

      Anyways, without looking at the rest of your code, here's an alternate shot at the iso2epoc sub. Not sure if that aproach is preferable, but it was the first idea that came to my mind (had to change the year to 2005 since timelocal doesn't seem to handle 2099 on my box).
      use strict; use warnings; use POSIX qw(strftime); use Time::Local; sub iso2epoc { my $iso = shift; # check input and separate the date and time return undef unless ($iso =~ /^\@(?:(\d{0,8})T?(\d{0,6}))?$/); unless (length($iso) == 16) { my $date = $1 || ''; my $time = $2 || ''; # we need to know how many characters each has my $len_date = length($date); my $len_time = length($time); # get current date & time as defaults my $defdate = strftime("%Y%m%d", localtime); my $deftime = strftime("%H%M%S", localtime); # now we just copy the missing parts before the incomplete dat +e & time $iso = '@' . substr($defdate, 0, 8 - $len_date) . $date # this assumes T22 means 22:00:00 . 'T' . $time . substr($deftime, 0, 6 - $len_time) +; } if ($iso =~ /^\@(\d{4})(\d{2})(\d{2})T(\d{2})(\d{2})(\d{2})$/) { return timelocal($6,$5,$4,$3,$2 - 1,$1); } return undef; } foreach my $iso (<DATA>) { chomp $iso; my $epoc = iso2epoc($iso) or die "wrong input: $iso"; print "$iso -> $epoc (" . strftime("%Y-%m-%d %X", localtime($epoc) +) . ")\n"; } __DATA__ @20050122T000000 @050122T000000 @0122T000000 @22T000000 @T000000 @T0000 @T00 @22T @
      prints:
      @20050122T000000 -> 1106348400 (2005-01-22 00:00:00) @050122T000000 -> 1106348400 (2005-01-22 00:00:00) @0122T000000 -> 1106348400 (2005-01-22 00:00:00) @22T000000 -> 1114120800 (2005-04-22 00:00:00) @T000000 -> 1112392800 (2005-04-02 00:00:00) @T0000 -> 1112410800 (2005-04-02 05:00:00) @T00 -> 1112412540 (2005-04-02 05:29:00) @22T -> 1114140596 (2005-04-22 05:29:56) @ -> 1112412596 (2005-04-02 05:29:56)
        Thanks for a reply. I'll test it. It's really shorter than my solution. Thanks for your remark to truncate the log file. I put a readmore tag but it doesn't seem to work. hm... strange... it worked for me before... Thanks again.