comment on

where "【" should be 12304 when encoded but it become splited into 3 parts: 227,128,144

Yes, that is properly encoded UTF-8. Codepoints from U+0800 to U+FFFF are to be encoded with 3 bytes. The codepoint 12304, which is 0x3010 in hex, usually using the U+3010 notation for Unicode, should be encoded as the three bytes 0xE3, 0x80, 0x90. Working it out:

Codepoint 12304 (codepoint 0x3010 in hex, U+3010)

hex 0x3010
hex 3    0    1    0
bin 0011 0000 0001 0000
    xxxx yyyy yyzz zzzz     (use x, y, and z to indicate the groups of bits in the codepoints)

encoding:
    ....xxxx ..yyyyyy ..zzzzzz  (use xyz as above; use dots . to indicate bits specified in UTF-8 encoding)
bin 11100011 10000000 10010000

hex E3       80       90
dec 227      128      144

... which is what you listed

(This is, btw, why Corion told you to look for the charset=utf-8 in the Content-type, because he recognized those three bytes were the appropriate UTF-8 encoding of the LEFT BLACK LENTICULAR BRACKET (U+3010) )

In reply to Re^3: String match in Chinese character by pryrt
in thread String match in Chinese character by hankcoder

Are you posting in the right place? Check out Where do I post X? to know for sure.
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
Want more info? How to link or How to display code and escape characters are good places to start.


We don't bite newbies here... much
	PerlMonks