go ahead... be a heretic | |
PerlMonks |
Re^3: Composite Charset Data to UTF8?by Corion (Patriarch) |
on Jun 19, 2013 at 12:07 UTC ( [id://1039767]=note: print w/replies, xml ) | Need Help?? |
Have a look at the encoding rules of UTF-8. A valid UTF-8 sequence starts either with 0b0xxxxxxx or with 0b11xxxxxx. So any octet starting with 0xb10xxxxxx is invalid UTF-8:
An untested easy check could be to match your string against /[\x80-\xBF]/, which are the hex representations of the bit patterns we've identified:
In Section
Seekers of Perl Wisdom
|
|