Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

Re^2: clean html tags

by dorward (Curate)
on Jan 26, 2007 at 10:14 UTC ( #596686=note: print w/ replies, xml ) Need Help??


in reply to Re: clean html tags
in thread clean html tags

"'" => "'",

The apos entity is an XML built it, and isn't defined for HTML. While some browsers support it in text/html documents, this is error correction and you should not use it.

It's best to escape the data as it's coming in; otherwise it's very difficult to distinguish between, for example, a less-than sign that should be converted to < and one that is part of the markup.

My preference is to convert from text to HTML at the last minute to avoid issues where I need to manipulate the data in Perl. (Template::Stash::EscapeHTML is quite cool).

What matters though is doing it in one place, so its easy to spot when you forget to protect a bit of user input from XSS et al.


Comment on Re^2: clean html tags
Download Code
Re^3: clean html tags
by sgifford (Prior) on Jan 26, 2007 at 19:30 UTC
    The apos entity is an XML built it, and isn't defined for HTML. While some browsers support it in text/html documents, this is error correction and you should not use it.
    Ah, that's interesting. I find it very useful to ensure that user-generated text doesn't break out of an HTML or JavaScript string, which is a big win IMHO. For example, if a template says:
    <img src='$IMAGE1' alt='$DESCRIPTION1'>
    I can be sure that $IMAGE1 and $DESCRIPTION1 won't mess up my HTML formatting if I can ensure it doesn't have apostrophes, but otherwise it's impossible.

    Are you aware of any browsers that don't support this entity in HTML?

      Ah, that's interesting. I find it very useful to ensure that user-generated text doesn't break out of an HTML or JavaScript string

      You get the same effect if you use the numeric character reference as described in the document I previously linked to, or avoid delimiting attribute values with single quotes and use the more conventional double quotes.

      Are you aware of any browsers that don't support this entity in HTML?

      Not off the top of my head, but using it in text/html is non-standard, and its easy to avoid.

        To follow up: I ignored dorward's advice and left this in, and it turns out it doesn't work well in some little browser called "Internet Explorer," which apparently some people like to use. :-)

        Changing &apos; to &39; fixed the problem, as he suggested it would.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://596686]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chilling in the Monastery: (18)
As of 2014-08-28 16:29 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The best computer themed movie is:











    Results (264 votes), past polls