If you don't know the encoding of data, how shall perl know it?
in reply to Re^2: Unicode substitution regex conundrum
in thread Unicode substitution regex conundrum
When you don't set CGIs charset parameter, it will return a byte string, that you have to convert to perl's internal format... but I wrote that already.
You can try to use Encode::Guess if you have a few possible charsets that aren't too similar and your input data is long enough.
If this doesn't hold true you have to take care that your input will be in a known encoding, for example in HTML forms you can set the accept-charset attribute to utf8 only.
And when you want unicode semantics in regex matches, check with encode::is_utf8($string) that it is indeed in perl's internal format.