Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

Re: Pimp My RegEx

by tlm (Prior)
on May 31, 2005 at 18:35 UTC ( [id://462154]=note: print w/replies, xml ) Need Help??


in reply to Pimp My RegEx

I'm trying to understand the rationale behind your regex. Don't you want something more like

my $parse_log_entry = qr/^($dateRegex.*?)(?=(?:^$dateRegex|\z))/ms;
? If so, then it I would process the file line-wise, and count only those lines that begin with your date pattern.

Or, if you need to do more than just counting, then something along the lines of

my $current; while ( <DATA> ) { if ( /^$dateRegex/ ) { process( $current ) if defined $current; $current = $_; } else { $current .= $_; } } process( $current ) if defined $current;

Update: Added the more extended second alternative.

Update 2: Fixed bug (last line of code was missing).

the lowliest monk

Replies are listed 'Best First'.
Re^2: Pimp My RegEx
by heathen (Acolyte) on May 31, 2005 at 18:46 UTC
    For the sake of brevity, I cut out my "payload" and just counted lines. I counted lines just to benchmark the regex times. My rationale was something like this: "find a line that starts with a date, capture all of the text (including new lines) until I can lookahead and find another date."

      Yes, but as originally written your regex would miss the last log entry, because it fails the positive lookahead assertion.

      Regarding the payload, please see the update to my original reply.

      Update: Added further comments on the error in the original regex.

      the lowliest monk

        as originally written your regex would miss the last log entry, because it fails the positive lookahead assertion

        You are correct - definately an unintended consequence. Since a log file entry could span multiple lines I was using the lookahead to in effect say "stop capturing when you can see the next log file entry"
        The sample data I included in the original script probably should have included some of the more complex, multi-line log file entries.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://462154]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others pondering the Monastery: (5)
As of 2024-03-19 09:59 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found