Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

Re: Search all occurences of text delimited by START and END in a string

by pme (Monsignor)
on May 12, 2015 at 13:28 UTC ( [id://1126424]=note: print w/replies, xml ) Need Help??


in reply to Search all occurences of text delimited by START and END in a string

Hi natol44,

You can process your file simply line-by-line.

#!/usr/bin/perl use strict; use warnings; use diagnostics; my $start = '<START>'; my $end = '<END>'; foreach (<DATA>) { chomp; print "$1\n" if /$start(.+)$end/; } __DATA__ <START>TEXT1<END> <various data> <START>TEXT2<END> <various data> <START>TEXT3<END>
Or you may use HTML::Parser if your html files are not well formatted.
  • Comment on Re: Search all occurences of text delimited by START and END in a string
  • Download Code

Replies are listed 'Best First'.
Re^2: Search all occurences of text delimited by START and END in a string
by kennethk (Abbot) on May 12, 2015 at 15:04 UTC
    While your answer is accurate to within the posted spec, for good form it's probably better to make it new line tolerant and to encourage people inlining variables into regexes to escape meta characters.
    #!/usr/bin/perl use strict; use warnings; use diagnostics; my $start = '<START>'; my $end = '<END>'; my $data = do { local $/; <DATA>; }; while ($data =~ /\Q$start\E(.+?)\Q$end\E/sg) { print "$1\n"; } __DATA__ <START>TEXT1<END> <various data> <START>TEXT2<END> <various data> <START>TEXT3<END>
    If they are concerned about holding the whole file in memory, there is a convenient choice for record separator:
    #!/usr/bin/perl use strict; use warnings; use diagnostics; my $start = '<START>'; my $end = '<END>'; local $/ = $end; while (<DATA>) { while (/\Q$start\E(.+?)\Q$end\E/sg) { print "$1\n"; } } __DATA__ <START>TEXT1<END> <various data> <START>TEXT2<END> <various data> <START>TEXT3<END>
    where I've kept the regex as is since the last record will not be <END> delimited, and so there'd be a failure for an unmatched <START>

    #11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1126424]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having an uproarious good time at the Monastery: (3)
As of 2024-04-20 01:11 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found