I included examples of each in the above test program. Note that some of the examples are multiple bytes (#1 below, for example, is two characters, one of three bytes and one of two). Best I can tell, the formats are:
1. UTF-8: chr(226).chr(152).chr(134), chr(195).chr(161)
2. CP1252: chr(150), chr(153)
3. HTML: '®', 'Æ'
4. ASCII: '&'
5. Unicode codepoints: chr(63743), chr(991), chr(9760));
Obviously the database is a bit 'special'. Unfortunately it is provided by a 3rd party, a very large company, and I have no control over their input sanitization.