Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

mutiple-line regexes?

by jens (Pilgrim)
on Aug 19, 2002 at 05:43 UTC ( #191086=perlquestion: print w/ replies, xml ) Need Help??
jens has asked for the wisdom of the Perl Monks concerning the following question:

I don't have a great deal of Perl experience, so a bit of patience with this question please if it seems basic.

I recently used Perl to automate marking up some data files with HTML for inclusion in a website. This, naturally, proved much faster than coding HTML by hand! However, there was one scenario that I couldn't figure out, and wound up doing manually--I needed to remove six-line blocks of data, but only if a particular pattern existed in the fourth line.

For future reference, how do you do this in Perl? Thanks very much for your reply.

jens

Comment on mutiple-line regexes?
Re: mutiple-line regexes?
by spurperl (Priest) on Aug 19, 2002 at 06:06 UTC
    Shortly, the /m and /s pattern modifiers should help dealing with multi-line mathches.
    m Treat string as multiple lines. That is, change "^" and "$" from matching the start or end of the string to matching the start or end of any line anywhere within the string. s Treat string as single line. That is, change "." to match any character whatsoever, even a newline, which normally it would not match.
    Read "perldoc perlre" for more information, or post a specific problem.
Re: mutiple-line regexes?
by jens (Pilgrim) on Aug 19, 2002 at 06:55 UTC
    Here's some simplified code to give you an idea of what I wanted to accomplish:

    Example HTML code (produced by saving as HTML from OpenOffice Calc) simplified for sake of argument:

    <tag1> <tag2> <tag3> <tag4><MYTAG>is there stuff here?</MYTAG></tag4> </tag3> </tag2></tag1>

    If the contents between MYTAG were *blank*, then I wanted to delete the entire six lines.



    --jens

      Instead of using a multine regex, you could consider looping over the lines in turn and removing every six lines where there's a match on the fourth.

      I think the code is pretty self-explanatory (@lines contains your file):

      my $i = 3; # from line 4 ... while ($i < @lines - 2) { # ... until last but 3 if ($lines[$i] =~ m!<MYTAG></MYTAG>!) { splice(@lines, $i - 3, 6); # kill 6 lines from $i - 3 } else { $i++; } }

      — Arien

      does your data always look like pseudo-HTML? tagged? maybe you can have better results with HTML::TokeParser

      here's some untested code:

      my $p = HTML::TokeParser->new($html) || die "Can't tokenize: $!"; # get each <tag1> alone. while (my $token = $p->get_tag('tag1')) { # store the original text $origtext = $token->[3]; # get data between <MYTAG></MYTAG> my $myTag = $p->get_tag('MYTAG'); my $text = $p->get_text('/MYTAG'); if ($text ne '') { # tag is not empty.. so $origtext retains # the data we want.. # ... do whatever with $origtext and move on } }

      He who asks will be a fool for five minutes, but he who doesn't ask will remain a fool for life.

      Chady | http://chady.net/

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://191086]
Approved by rob_au
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others wandering the Monastery: (6)
As of 2015-07-06 00:04 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (68 votes), past polls