Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

question on multi line pattern matching for html formatting

by tallCoolOne (Initiate)
on May 17, 2009 at 08:33 UTC ( [id://764492]=perlquestion: print w/replies, xml ) Need Help??

tallCoolOne has asked for the wisdom of the Perl Monks concerning the following question:

I'm having a hard time with what I'm sure is a simple thing. I am making html pages out of a large number of text files.
I have code that adds markup for everything I want except paragraph formatting.
All of the text I am concerned with is in an array (@ThisFileArray), and it's just a bunch of text, with paragraphs separated by an extra newline. Kind of like this:

Text blah blah, make my point here. More text and even more continuing on along the way.

New paragraph starts here, there were 2 newlines just before I started this paragraph, so I should somehow be able
to match on that, but I can't figure out how to search for that through my array.

If I could do that, then I would search something like:
s/\n\n/</p>\n\n\<p>/m
so that I would add a close paragraph tag at the end of the previous paragraph,
and an open paragraph tag at the beginning of the next.
Of course, I want to add a <p> tag at the beginning of the array, to insert it at the beginning
of the text, and then add one final </p> tag at the end of the text, to make everything really nice.
The end result would be (ideally) something like this:

<p>Text blah blah, make my point here. More text and even more continuing on along the way.</p>

<p>New paragraph starts here, there were 2 newlines just before I started this paragraph, so I should somehow be able
to match on that, but I can't figure out how to search for that through my array.</p>

I have tried this:
while (<@ThisFileArray>) { s/\n\n/</p>\n\n<p>/m }
But clearly I am missing some key ingredient here, because it doesn't work. Can anyone provide some enlightenment to this frustrated newbie? Thanks so much. Mark

Replies are listed 'Best First'.
Re: question on multi line pattern matching for html formatting
by wfsp (Abbot) on May 17, 2009 at 09:22 UTC
    ...the text I am concerned with is in an array (@ThisFileArray)...
    A for loop is usually better for looping over an array.
    ...with paragraphs separated by an extra newline....
    Perl can do that for you if you set $/ (see How can I read in a file by paragraphs?).
    #!/usr/bin/perl use strict; use warnings; { local $/ = q{}; my @txt = <DATA>; for (@txt) { chomp; print qq{*$_*\n}; } } __DATA__ Text blah blah, make my point here. More text and even more continuing + on along the way. New paragraph starts here, there were 2 newlines just before I started + this paragraph, so I should somehow be able to match on that, but I can't figure out how to search for that throug +h my array.
    *Text blah blah, make my point here. More text and even more continuin +g on along the way.* *New paragraph starts here, there were 2 newlines just before I starte +d this paragraph, so I should somehow be able to match on that, but I can't figure out how to search for that throug +h my array.*
    Note that the extra pair of braces and local restricts the scope of the change to $/ (which is a global).

    I've added stars instead of tags. I would seriously consider using something like HTML::Element's new method to deal with creating HTML tags. YMMV.

      OK this is fantastic - perfect even. It works just as you describe. But to incorporate it into the rest of my script activities, I would like to capture the output of the print statement in an array if possible, and for some reason nothing that I try seems to work. If I use your print, it outputs to the display, with my html paragraph marks just great. But why can't I do something like:

      @newArray = qq{*$_*\n};

      ??
      I've played with every variation of this that I can think of, but I guess I am still too inexperienced to see the obvious. Can you help me put this output into a new array so that I can output it later, when I have all the other elements of my text file processed the way I want them?
      Thank you so much for a neat, clean solution.
      Mark
        OK, new guy has a typing problem. Turns out I WAS capturing the output in an array, I just mis-typed the name of that array later when I wanted to print it to see my results. So:

        @newArray = qq{*$_*\n};

        works. I am now figuring out how to pick up each paragraph that is formatted until I get to the end of the file, and save the completed, formatted output into the new array, for later output to the final html file.
        Sorry to use bandwidth when I didn't need to. *embarrasses self for first time in new community*
Re: question on multi line pattern matching for html formatting
by EvanCarroll (Chaplain) on May 17, 2009 at 09:51 UTC
    The first reply addresses doing this right way, I'll go ahead and show you how to do it the wrong way, using line matching like you tried. We'll achieve the same effect though.
    #!/usr/bin/perl -l use strict; use warnings; my $file; { local $/ = undef; $file = scalar <DATA>; } ## Find me two newlines or be the start of the file ## Followed by any amount of newlines ## Followed by one or more characters (our capture, your paragraph) ## Followed by any amount of newlines ## Followed by any two newlines or the end of the file ## Flags: treat the file as one line, continue search for more paragra +phs (/g in list context) while ( $file =~ m/ (?>\n{2}|\A) \n* (.+?) \n* (?=\n{2}|\z)/sxg ) { print "<p>$1</p>"; } __DATA__ Paragraph one is this Paragraph two is this Paragraph three is this Paragraph three and this But stops here and doesn't get the rest of the newlines


    Evan Carroll
    The most respected person in the whole perl community.
    www.EvanCarroll.com

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://764492]
Approved by wfsp
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others goofing around in the Monastery: (4)
As of 2024-04-19 16:16 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found