Re: Pimp My RegEx

Your regex is really horribly inefficient. Think about what it has to do to match that pattern. A much simpler way to do the same thing is:

#!/usr/local/bin/perl

use warnings;
use strict;
use English;
use Data::Dumper;
use Time::HiRes 'time';

my $logfile;

my $count = 0;                  # Initialize counter
my $start = time();             # Start the timer

my $fullrec;

while( <DATA>) {
    if (/^\d\d\d\d-\d\d-\d\d \d\d:\d\d:\d\d:\d\d\d/) {
        $count++;   
        #if ($fullrec) {
        #   process($fullrec);
        #}
        $fullrec = $_;
    } else {
        $fullrec .= $_;
    }
}

#if (defined($fullrec)) {
#   process($fullrec);
#}

my $end = time();               # Stop the timer
my $elapsed = $end - $start;    # How long did that take?
my $average = $elapsed/$count;  # Average processing time

printf "Parsed $count log file entries in %.4f seconds, averaging %.4f
+\n", $elapsed, $average;

exit;


__DATA__
<SNIPPED DATA>
[download]

Ie you know that when a line starts with a data that it begins a record, this also implys it denotes the end of the previous record (if such a record exists). Thus you simply need to check each line to see if it begins the record, and then do something with the previous one that you have constructed. This also means that the pattern is anchored and only needs to compare Lr (the length of pattern) chars per line instead of the the Lf (the length of the file) times Lr that your code would do (with the look ahead assertion).

So your pattern does something like 55000000*23 character lookups, even worse most of those will be char class lookups so they are inefficient to start with. If you use the line by line approach you are dealing with 366000*23 lookups. Thats a LOT less. (Actually these are upper bounds, but i think the point is made)

Update:Fixed as per ikegami's reply.

---
$world=~s/war/peace/g

Comment on Re: Pimp My RegEx Download Code

Replies are listed 'Best First'.
Re^2: Pimp My RegEx by ikegami (Patriarch) on May 31, 2005 at 18:58 UTC
That should be: `while( <DATA>) { if (/^\d\d\d\d-\d\d-\d\d \d\d:\d\d:\d\d:\d\d\d/) { $count++; #if (defined($fullrec)) { # process($fullrec); #} $fullrec = $_; } else { $fullrec .= $_; } } #if (defined($fullrec)) { # process($fullrec); #}` [download] or look at my solution to eliminate this redundancy.	[reply] [d/l]
Re^3: Pimp My RegEx by demerphq (Chancellor) on May 31, 2005 at 19:03 UTC
Indeed. Good catch. :-) --- $world=~s/war/peace/g	[reply]
Re^2: Pimp My RegEx by heathen (Acolyte) on May 31, 2005 at 19:07 UTC
Thanks for sharing your wisdom! I can't believe how quickly you guys came up w/ replies. The PerlMonks community is a great resource! I wanted to use file slurp mode because so that could MOVE old log file entries from the current log into a seperate monthly log file. (and then tar/gzip 'em) I thought that by using the slurp mode, I could essentially edit the file in place without having to create a temporary file. The file system these logs live in is often constrained for space. Thanks again!	[reply]
Re^3: Pimp My RegEx by eyepopslikeamosquito (Archbishop) on May 31, 2005 at 21:20 UTC
I thought that by using the slurp mode, I could essentially edit the file in place without having to create a temporary file. No, that is unsound. You can corrupt your log files. Consider: Slurp the log file. Apply regex to file contents. Write the log file with new file contents. Now, if the write fails due to "disk full" (a distinct possibility in your environment) or because you lose power at that instant, or for any old reason, you have just corrupted your log file. Generally, you should write the modified file to a temporary file and rename the temporary file over the original only after checking that the temporary file was created without error.	[reply]
Re^3: Pimp My RegEx by demerphq (Chancellor) on May 31, 2005 at 19:13 UTC
I could essentially edit the file in place without having to create a temporary file. Generally this isn't possible. You can only overwrite the existing parts of the log, not slice them out of the file. The normal way to do what you want is to process your input records as shown here by me and several of the other monks and then process them and output them to the new file, and then delete the old file. You can't really remove from the beginning of a file while something is writing to the end. Anyway, glad to help. :-) --- $world=~s/war/peace/g	[reply]
Re^4: Pimp My RegEx by heathen (Acolyte) on May 31, 2005 at 19:25 UTC
Generally this isn't possible. You can only overwrite the existing parts of the log, not slice them out of the file. Alas, it was a worthy effort!	[reply]


Just another Perl shrine
	PerlMonks