in reply to UTF8 Validity

This is slightly off-topic here, but perhaps it's useful for someone some day:

UTF8 ne UTF-8. You can have a string that is valid UTF8 but not valid UTF-8 (UTF-8 is more strict, and allows just one way to encode each codepoint, UTF8 also allows non-canonical encodings).

That was my first thought when I read the title "UTF8 Validity", which is not "UTF-8 Validity" ;-)