in reply to
Re^2: Unicode substitution regex conundrum
in thread Unicode substitution regex conundrum
If you don't know the encoding of data, how shall perl know it?
When you don't set CGIs charset parameter, it will return a byte string, that you have to convert to perl's internal format... but I wrote that already.
You can try to use Encode::Guess if you have a few possible charsets that aren't too similar and your input data is long enough.
If this doesn't hold true you have to take care that your input will be in a known encoding, for example in HTML forms you can set the accept-charset attribute to utf8 only.
And when you want unicode semantics in regex matches, check with encode::is_utf8($string) that it is indeed in perl's internal format.