Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

converting carriage returns to <br> tags (was: Simple Question for you guys)

by Anonymous Monk
on May 18, 2001 at 17:15 UTC ( [id://81486]=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi,

This is ridiculously easy but I can't get it (Relative newbie) Anyway....

I have a form with a textarea field that updates a news page, what I need to do is convert all the carriage returns in that field to <BR> tags.

I know I should be ashamed of myself for not knowing but I have been coding VBScript for the past 6 months and I can't seem to get out of the vibe!

Any help would be greatly appreciated,

Thanks in advance.

Lloyd.

Edit 2001-05-18 by mirod: changed title

  • Comment on converting carriage returns to &lt;br&gt; tags (was: Simple Question for you guys)
  • Download Code

Replies are listed 'Best First'.
Re: Simple Question for you guys.
by AidanLee (Chaplain) on May 18, 2001 at 17:19 UTC

    you just need to do a global search and replace on the field with a regular expression that swaps the newline with the tag:

    $fieldtext =~ s|\n|<br />|g # self terminating tag for XHTML complian +ce

    note i've changed the regex delimiter to a pipe ( | ) so i don't have to escape the '/' character in the br tag.

Re: Simple Question for you guys.
by blue_cowdawg (Monsignor) on May 18, 2001 at 17:24 UTC
    my $text=$cgi->param('mytextfield'); $text =~ s/\n/\<br\>/g; ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Peter L. Berghold --- Peter@Berghold.Net "Those who fail to learn from history are condemned to repeat it."
      why have you escaped the < and > symbols? AFAIK they are not special inside a regex.

        Simple. I felt like it. When in doubt escape. It doesn't cost much and it certainly doesn't hurt.

        ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
        Peter L. Berghold --- Peter@Berghold.Net
        "Those who fail to learn from history are condemned to repeat it."
        
Re: Simple Question for you guys.
by voyager (Friar) on May 18, 2001 at 20:13 UTC
    As I learned when I posted a similar question, the textarea is likely to be giving you newlines and returns if the client is PC. So you might want something like:
    $textarea =~ s|[\r\n]|<br />|g;
    Note: not sure if it's \r\n or \n\r.
      If the textarea is returned with \r\n line endings, then that substitution will insert two <BR /> tags at the end of each line.

      I prefer something like this: $textarea =~ s,\r\n?|\n\r?,<br />\n,g; I like to keep a newline after the BR tag, to make the HTML easier to read.

      why are you using
      s|[\r\n]|<br />|g;
      instead of
      s/[\r\n]/\<br \>/g;

      are the pipes somehow more efficient in this case or just more readable?

      the other thing i'm wondering about is the square brackets... does that mean  /[asdf\.]/ would search for each of those characters (a, s , d, f and Period) versus the string 'asdf.'

      i know they're silly questions but i'm trying to get back up to speed with reading perl ... after about 8 months in visual basic for applications

      there were so many REALLY nice things about working in VBA ... for example the editor will automatically fill display a list of the sub objects and methods of the object that you're working with... but then again the number of times i've struggled with the long way around a hash table or an array makes me really glad to be back in PerlScript (with a bit of Win32::OLE)

      anyways ... i'm rambling :)

        pipes are no more efficient as regex delimiters. You have it exactly right on your latter guess though, it's a whole lot more readable than

        s/[\r\n]/<br \/>/g;

        You've also guessed right about the brackets. It's a way to specify a group of characters to match without specifying what order they appear in.

        You can choose your own delimiters. Anytime there's forward or back slashes in the reg exp, it' better to use a different delimitter. "Slanted toothpicks" or something is the name for the syndrome to avoid.
      A good suggestion for getting the job done right (which I'm always a fan of), but it doesn't hurt to note that the browser will display things okay without converting both. As long as a <br /> tag is there.
        True. What caused me pain was wanting to convert two newlines to two BR tags, but leave single newlines alone.

        Trying to match on \n\n was not working because what was there was really \n\r\n\r.

Re: converting carriage returns to br tags (was: Simple Question for you guys)
by tachyon (Chancellor) on May 20, 2001 at 06:29 UTC
    A few comments on all these comments! First this is really all you need for *most* circumstances. $textarea =~ s/\n/<BR>\n/g; We substitute <BR>\n so that we get the effect: Was: blah blah Now: blah<BR> blah If we sub just <BR> instead of \n<BR>\n we will get blah<BR>blah If you prefer to get blah <BR> blah then use \n<BR>\n as the sub pattern Depending on platform, the \n sequence is converted by perl to: Unix: octal \012 hex 0xA dec 10 LF may be \n Dos: octal \015\012 hex 0xD0xA dec 13 10 CRLF may be \r\n Max: octal \015 hex 0xD dec 13 CR may be \r Although perl works for you trying to allow you to just use \n as your newline delimiter and let it sort the platform dependent details, many common *internet protocols* specify the \015\012 sequence and unfortunately the values of Perl's \n and \r are not reliable since they can and do vary from system to system. I suspect that $textarea is named from its HTML source so you will probably want to use a truly portable solution like this: $textarea =~ s/\015\012|\015|\012/<br>\n/g; If you prefer hex to octal :-) $textarea =~ s/\xD\xA|\xD|\xA/<br>\n/g; If you are confused by the \012 or \xA notation all this is saying to perl is what I want you to match is the ASCII char decimal 10 == octal 12 == hex A == binary 1010 In expanded commented /x form: $textarea =~ s/ # substitute \015\012 # a CRLF sequence (DOS, MIME...) | # or \015 # a lone LF (mac) | # or \012 # a lone LF (unix) /<br>\n # with literal '<br>' plus newline /xg; # /x allow comments, /g do globally There are flaws, both major and minor, with *all* solutions posted: s|\n|<br />|g # you don't need the unnecessary space or the / before the > # as \ is the escape char, this will sub '<br />' for \n! # rather than escape the > making it a literal which it is anyway. tr/\n/<BR>/s # you still need /g, not /s even allowing for using s instead of tr s/\n/\<br\>/g; # the escapes are correct but both unnecessary. This is the first # suggestion that will actually work (most of the time) s|[\r\n]|<br />|g; # this is wrong. Leaving aside the problems with using \r and \n # and the fact it will sub '<br />' the problem is this: # if we have \r\n we will get <br><br> (assuming we fix the sub) # with \r or \n we will get <br> so we get a different and platform # dependent result. This is partially fixed by changing to: s|[\r\n]+|<br>|g; # however if we have \r\n\r\n or \n\n or \r\r we get just one <br> # replacing a series of line breaks, probably not what we want s,\r\n?|\n\r?,<br />\n,g; # this suffers from \r \n problems, matches \n\r which is not a # desired result and subs in '<br />' again -> not an HTML tag Phew, I feel better now I've got that off my chest. Finally for those that are not familiar with the concept you may use *almost* any non-alphanumeric char as a regex delimiter. Thus we could use paired brackets $textarea =~ s(\015\012|\015|\012)(<br>\n)g; Unpaired brackets: $textarea =~ s{\015\012|\015|\012}<<br>\n>g; Brackets then a pair of something else, even # chars $textarea =~ s[\015\012|\015|\012]#<br>\n#g; With brackets we can split onto two lines: $textarea =~ s (\015\012|\015|\012) [<br>\n]g; If using non brackets we can even use ; if you are into obfuscation $textarea =~ s;\015\012|\015|\012;<br>\n;g; If our delimiter is included as a literal in the pattern we need to backslash it \ (escape it) to make it take on a literal meaning and match itself within the patern rather than be taken by perl as a one of the regex delimiters In a regex only these 12 characters need escaping, although when in doubt it *generally* does no harm to escape a character. \ | ( ) [ { ^ $ * + ? . All these chars have special meaning in a regex and if you do much with regexes you will soon get to know them by heart Cheers tachyon
      FYI, the use of <br /> is intentional; it is an XHTML tag. XHTML is a rewrite of HTML conforming to the rules of XML.

      By the way, I don't think that there would be any problems with using this substitution in practice: s/\r\n?|\n\r?/<br>/g; It is true that the match of \012\015 may be avoided with the substitution you suggested: s/\015\012|\015|\012/<br>/g; But with the former, you don't have to remember whether \015 or \012 is supposed to come first. :)

A reply falls below the community's threshold of quality. You may see it by logging in.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://81486]
Approved by root
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others browsing the Monastery: (4)
As of 2024-04-25 14:42 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found