Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

Re: regular expression - grabbing everything problem

by davido (Cardinal)
on Aug 09, 2011 at 00:16 UTC ( [id://919355]=note: print w/replies, xml ) Need Help??


in reply to regular expression - grabbing everything problem

If you wish to process the file line by line, you can use the flip-flop operator. This would make it unnecessary to use explicit control flags. Here's an example:

while( <DATA> ) { print if /^Header2:/ .. eof; } __DATA__ <lsmothers@example.com> SMTP 0<001501c4db9b$db8b2680$2d01a8c0@ryand9v889t9uc> .X-Intermail-Unknown-MIME-Type=unparsedmessage Header2: <headertwo@example.com Received: from server.cluster1.example.com ([10.20.201.160]) line 12

Updated as suggested in a followup to this post, by using eof as the RHS of the flip-flop. Nice if the script is altered to read from <>, as per the suggestion in a followup to this post.

It seems strange to use Flip flop if you're only concerned with the initial flip. But it does work nicely. And if you're processing more than one file it can be used to catch the end of file to reset the search for the next file. The flip flop operator is discussed in the "Range Operators" section of perlop, as it's the same '..' operator.

If you prefer to slurp the file into a string and process accordingly, you can do it like this:

my $input = do { local $/ = undef; <DATA> }; if ( $input =~ /^(Header2:.+)/ms ) { print $1; }

Or even...

my $input; { local $/ = undef; $input = <DATA>; } print join '', ( split /^(Header2:)/m, $input, 3 )[ 1, 2 ];

The split method could be altered to avoid capturing by using a lookahead assertion as the split point, like this:

print join '', ( split /^(?=Header2:)/m, $input, 2)[1];

This method creates only two elements; the one we don't want, and the one we do. The other split method created three elements; the one we don't want, the trigger text, and the rest of what we want to keep, so for that we have to specify that we want both elements 1 and 2.

One liner versions of each of the above:

perl -ne 'print if /^Header2:/ .. eof' testdata.txt perl -0777 -ne '/^(Header2:.+)/ms and print $1' testdata.txt perl -0777 -pe '$_=join q//,(split /^(Header2:)/m,$_,3)[1,2]' testdata +.txt perl -0777 -pe '$_=join q//,(split /^(?=Header2:)/m,$_,2)[1]' testdata +.txt

Dave

Replies are listed 'Best First'.
Re^2: regular expression - grabbing everything problem
by jwkrahn (Abbot) on Aug 09, 2011 at 00:42 UTC

    For your first example, 1 is always true, which will work fine if there is only one file in @ARGV, however you should probably use eof instead.

      That's a great suggestion. I intended for it being always true, which seemed fine for simplicity's sake. But if someone is using @ARGV or the empty diamond operator, eof is the answer. Excellent. Updating now. Thanks!


      Dave

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://919355]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others wandering the Monastery: (5)
As of 2024-04-24 19:08 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found