I don't know anything about UCS

UCS is essentially a legacy set of encodings for Unicode. UCS-2 is a two byte encoding, UCS-4 uses four bytes.

UCS-2 is very similar to UTF-16, except that only characters in the BMP are allowed. UCS-2 has no concept of surrogates. You can read UCS-2 like you would read UTF-16. And if you write UTF-16 without surrogates, you also have written UCS-2. UTF-16 with surrogates is not compatible with UCS-2.

UCS-4 is very similar to UTF-32, capable of encoding 2^63 2^31 characters (sign bit is fixed to 0), but its definition is artificially limited to the range 0..0x10FFFF to stay compatible with other Unicode encodings. Because of this limitiation, UCS-4 and UTF-32 encode all characters in an identical way.

See also Universal Character Set, "Unicode Encodings" and "Beyond Unicode code points" in perlunicode.

More "Unicde and Perl" stuff: perlunicode, perlunicook,perlunifaq, perluniintro, perlunitut, Encode


Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)

