Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much

Answer: How do I extract all text between two keywords like start and end?

by stephen (Priest)
on Apr 15, 2000 at 02:16 UTC ( #7675=categorized answer: print w/replies, xml ) Need Help??

Q&A > regular expressions > How do I extract all text between two keywords like start and end? - Answer contributed by stephen

Hmmm... I'm afraid that the recursive 'between' above there might not work for complex cases. The non-greedy regexp would make it match the first start-end pair it found, so if we had:
yadda yadda start this is comment start this is still comment end this + should still be comment end yadda yadda
then we should wind up with the whole thing, minus start and end and yadda, but instead we get:
this is comment start this is still comment

The only way I can think of to get around this is by keeping external track of the levels. This also de-recurses it, which makes it less beautiful, but faster (in theory):

sub between { my ($text) = @_; my $level = 0; my @comments = (); while ( $text =~ m{\G .*? (start|end) (.*?) (?: (?=start|end) | $ +) }gxs ) { if ( $1 eq 'start') { $level++; } else { ($level > 0) and $level--; } $level > 0 and push(@comments, $2); } return join('', @comments); }

This returns:

this is comment this is still comment this should still be comment

So what we're doing here is going through the text looking for 'start's and 'end's. We keep a counter indicating how many levels deep we are in 'start's and 'end's. Every time we hit a 'start', we add one. Every time we hit an 'end', we subtract one, checking first to make sure that our level doesn't go negative. (Otherwise, somebody could mess us up by starting a file "end end end".)

Afterwards, we look at the patch of text between the current tag and the next start/end tag. If our level is greater than 0, we're between a 'start' and an 'end' tag, so we store that segment. Otherwise, we're not, so we look for another 'start' or 'end' tag until the end of file.

Log In?

What's my password?
Create A New User
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others meditating upon the Monastery: (4)
As of 2018-06-24 07:21 GMT
Find Nodes?
    Voting Booth?
    Should cpanminus be part of the standard Perl release?

    Results (126 votes). Check out past polls.