Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

Answer: How do I extract all text between two keywords like start and end?

( #7675=categorized answer: print w/ replies, xml ) Need Help??

Q&A > regular expressions > How do I extract all text between two keywords like start and end? contributed by stephen

Hmmm... I'm afraid that the recursive 'between' above there might not work for complex cases. The non-greedy regexp would make it match the first start-end pair it found, so if we had:
yadda yadda start this is comment start this is still comment end this + should still be comment end yadda yadda
then we should wind up with the whole thing, minus start and end and yadda, but instead we get:
this is comment start this is still comment

The only way I can think of to get around this is by keeping external track of the levels. This also de-recurses it, which makes it less beautiful, but faster (in theory):

sub between { my ($text) = @_; my $level = 0; my @comments = (); while ( $text =~ m{\G .*? (start|end) (.*?) (?: (?=start|end) | $ +) }gxs ) { if ( $1 eq 'start') { $level++; } else { ($level > 0) and $level--; } $level > 0 and push(@comments, $2); } return join('', @comments); }

This returns:

this is comment this is still comment this should still be comment

So what we're doing here is going through the text looking for 'start's and 'end's. We keep a counter indicating how many levels deep we are in 'start's and 'end's. Every time we hit a 'start', we add one. Every time we hit an 'end', we subtract one, checking first to make sure that our level doesn't go negative. (Otherwise, somebody could mess us up by starting a file "end end end".)

Afterwards, we look at the patch of text between the current tag and the next start/end tag. If our level is greater than 0, we're between a 'start' and an 'end' tag, so we store that segment. Otherwise, we're not, so we look for another 'start' or 'end' tag until the end of file.

Comment on Answer: How do I extract all text between two keywords like start and end?
Select or Download Code
Log In?
Username:
Password:

What's my password?
Create A New User
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others examining the Monastery: (9)
As of 2015-07-05 08:56 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (61 votes), past polls