<?xml version="1.0" encoding="windows-1252"?>
<node id="417625" title="Re^2: Security techniques every programmer should know" created="2004-12-27 15:21:19" updated="2005-03-23 11:57:33">
<type id="11">
note</type>
<author id="416970">
legato</author>
<data>
<field name="doctext">
&lt;p&gt;Your code will call anything with whitespace an unsafe string.  While that's much better than no checking, how about:
&lt;code&gt;
$string =~ s/!([\w\s]+)//; ##add other allowed chars as needed
&lt;/code&gt;
That will sanitize all strings to contain only numbers, digits, the underscore and whitespace.  A more complete regex (which would still not include unicode or international chars) would be:
&lt;code&gt;
$string =~ s/!([\w\s\!\@\#\$\%\^\&amp;\*\(\)\\\`\~\-\+\=\,\.]+)//; 
&lt;/code&gt;
(Yes, there's more escaping there than strictly necessary.)  Suddenly, that transliteration is looking a lot easier to maintain.  If your allowed set is "everything but nulls and control chars", then you're better off explicitly excluding the known control-char set.

&lt;p&gt;Denying all, then allowing is a good general rule of thumb.  But, in this case, the "dangerous" items are a fixed set while the "safe" items are much more variable -- so it makes sense to simply remove that which is dangerous.

&lt;p&gt;&lt;b&gt;&lt;i&gt;Update=&gt;&lt;/i&gt;&lt;/b&gt; [Aristotle] reminded me that, as &lt;tt&gt;\s&lt;/tt&gt; includes &lt;tt&gt;\n&lt;/tt&gt;, these regexes will not strip newlines; that means strings sanitized with these will be unsafe if executed with a shell (e.g. &lt;tt&gt;system("$string");&lt;/tt&gt;).  This further shows that inclusion-matching isn't as good, in this case, as merely stripping "bad" data out.

&lt;!-- Node text goes above. Div tags should contain sig only --&gt;
&lt;div class="pmsig"&gt;&lt;div class="pmsig-416970"&gt;
&lt;p align=right&gt;Anima Legato&lt;br&gt;&lt;tt&gt;.oO all things connect through the motion of the mind&lt;/tt&gt;&lt;/p&gt;
&lt;/div&gt;&lt;/div&gt;</field>
<field name="root_node">
417490</field>
<field name="parent_node">
417523</field>
</data>
</node>
