Your code will call anything with whitespace an unsafe string. While that's much better than no checking, how about:
$string =~ s/!([\w\s]+)//; ##add other allowed chars as needed
That will sanitize all strings to contain only numbers, digits, the underscore and whitespace. A more complete regex (which would still not include unicode or international chars) would be:
$string =~ s/!([\w\s\!\@\#\$\%\^\&\*\(\)\\\`\~\-\+\=\,\.]+)//;
(Yes, there's more escaping there than strictly necessary.) Suddenly, that transliteration is looking a lot easier to maintain. If your allowed set is "everything but nulls and control chars", then you're better off explicitly excluding the known control-char set.
Denying all, then allowing is a good general rule of thumb. But, in this case, the "dangerous" items are a fixed set while the "safe" items are much more variable -- so it makes sense to simply remove that which is dangerous.
Update=> Aristotle reminded me that, as \s includes \n, these regexes will not strip newlines; that means strings sanitized with these will be unsafe if executed with a shell (e.g. system("$string");). This further shows that inclusion-matching isn't as good, in this case, as merely stripping "bad" data out.
Anima Legato .oO all things connect through the motion of the mind
| [reply] [Watch: Dir/Any] [d/l] [select] |
\w matches different things depending on your locale. If you have a German locale, for instance, it will match ß.
The danger of using perl's shortcut character classes, as was pointed out to me by DrHyde.
"Cogito cogito ergo cogito sum - I think that I think, therefore I think that I am." Ambrose Bierce
| [reply] [Watch: Dir/Any] |
| [reply] [Watch: Dir/Any] |