|laziness, impatience, and hubris|
Unicode characters in <code> blocksby Joost (Canon)
|on Oct 07, 2006 at 22:59 UTC||Need Help??|
This node by Nik confused me greatly.
Apparently, if you post unicode characters in a <code> block, you see the numeric HTML entities instead of the characters. I expect this is due to some kind of double encoding bug.
Here are some random unicode characters I randomly selected:
حيض,πβΫHere are the same characters in a <pre> block:
And here are the same characters in a <code> block:
Note that I didn't use any HTML numeric references myself - I just copy/paste the characters from the gnome "charmap" program into the textfield.
Below is the relevant part of the HTML source to this page:
As you can see, in the <code> block, the numeric entitie's & chars are incorrectly escaped. I think is is a pretty serious issue now that perl can handle native utf8 source.
Also, I notice that these characters are also doubly escaped in the textarea field that I'm typing in now (i.e. you can enter unicode chars in the textfield, but at preview, in the textfield, you'll just see a bunch of &#number; entries).