Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

Re: Pimp My RegEx

by demerphq (Chancellor)
on May 31, 2005 at 18:51 UTC ( [id://462166]=note: print w/replies, xml ) Need Help??


in reply to Pimp My RegEx

Your regex is really horribly inefficient. Think about what it has to do to match that pattern. A much simpler way to do the same thing is:

#!/usr/local/bin/perl use warnings; use strict; use English; use Data::Dumper; use Time::HiRes 'time'; my $logfile; my $count = 0; # Initialize counter my $start = time(); # Start the timer my $fullrec; while( <DATA>) { if (/^\d\d\d\d-\d\d-\d\d \d\d:\d\d:\d\d:\d\d\d/) { $count++; #if ($fullrec) { # process($fullrec); #} $fullrec = $_; } else { $fullrec .= $_; } } #if (defined($fullrec)) { # process($fullrec); #} my $end = time(); # Stop the timer my $elapsed = $end - $start; # How long did that take? my $average = $elapsed/$count; # Average processing time printf "Parsed $count log file entries in %.4f seconds, averaging %.4f +\n", $elapsed, $average; exit; __DATA__ <SNIPPED DATA>

Ie you know that when a line starts with a data that it begins a record, this also implys it denotes the end of the previous record (if such a record exists). Thus you simply need to check each line to see if it begins the record, and then do something with the previous one that you have constructed. This also means that the pattern is anchored and only needs to compare Lr (the length of pattern) chars per line instead of the the Lf (the length of the file) times Lr that your code would do (with the look ahead assertion).

So your pattern does something like 55000000*23 character lookups, even worse most of those will be char class lookups so they are inefficient to start with. If you use the line by line approach you are dealing with 366000*23 lookups. Thats a LOT less. (Actually these are upper bounds, but i think the point is made)

Update:Fixed as per ikegami's reply.

---
$world=~s/war/peace/g

Replies are listed 'Best First'.
Re^2: Pimp My RegEx
by ikegami (Patriarch) on May 31, 2005 at 18:58 UTC
    That should be:
    while( <DATA>) { if (/^\d\d\d\d-\d\d-\d\d \d\d:\d\d:\d\d:\d\d\d/) { $count++; #if (defined($fullrec)) { # process($fullrec); #} $fullrec = $_; } else { $fullrec .= $_; } } #if (defined($fullrec)) { # process($fullrec); #}

    or look at my solution to eliminate this redundancy.

      Indeed. Good catch. :-)

      ---
      $world=~s/war/peace/g

Re^2: Pimp My RegEx
by heathen (Acolyte) on May 31, 2005 at 19:07 UTC

    Thanks for sharing your wisdom! I can't believe how quickly you guys came up w/ replies. The PerlMonks community is a great resource!

    I wanted to use file slurp mode because so that could MOVE old log file entries from the current log into a seperate monthly log file. (and then tar/gzip 'em)

    I thought that by using the slurp mode, I could essentially edit the file in place without having to create a temporary file. The file system these logs live in is often constrained for space.

    Thanks again!

      I thought that by using the slurp mode, I could essentially edit the file in place without having to create a temporary file.
      No, that is unsound. You can corrupt your log files. Consider:
      1. Slurp the log file.
      2. Apply regex to file contents.
      3. Write the log file with new file contents.
      Now, if the write fails due to "disk full" (a distinct possibility in your environment) or because you lose power at that instant, or for any old reason, you have just corrupted your log file. Generally, you should write the modified file to a temporary file and rename the temporary file over the original only after checking that the temporary file was created without error.

      I could essentially edit the file in place without having to create a temporary file.

      Generally this isn't possible. You can only overwrite the existing parts of the log, not slice them out of the file. The normal way to do what you want is to process your input records as shown here by me and several of the other monks and then process them and output them to the new file, and then delete the old file. You can't really remove from the beginning of a file while something is writing to the end.

      Anyway, glad to help. :-)

      ---
      $world=~s/war/peace/g

        Generally this isn't possible. You can only overwrite the existing parts of the log, not slice them out of the file.

        Alas, it was a worthy effort!

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://462166]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chanting in the Monastery: (6)
As of 2024-04-16 11:45 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found