When reading in CGI form fields from a multilingual, utf8 web application, it is not feasible to use the standard idiom for stripping evil characters:
$string =~ s/[^\w\s\.\,]//g; #plus any other metachars you want
$rawstring = m/([\w\s\.\,]+)/; #plus any others...
$string = $1;
Since users will be giving me all kinds of high bytes in order to give double-byte utf8 stuff, I need to be more accepting, as I understand it.
However, there persists the CGI Security and the null byte problem issue. Since the null byte can be used to fool various resources, I am tempted to subclass CGI and have the param() method do a s/\x00//g on *everything*. Is this ill-advised -- meaning, might the null byte ever show up in valid utf-8 text?
Remember, CGI uploads of binary files are handled through a different mechanism, so those would not be affected by overriding param(). Does a wise man always strip null bytes from param() returns, and if so, why isn't that the default behavior?