in reply to Untainting safely. (b0iler proofing?)

It depends entirely on the application. We don't know what "unsafe" is without knowing the context. If you're talking about shell meta-characters, it depends on which shell you're using (and which shell the user will be using), and should be relatively moot if you use the multiple-argument form of calls like system and exec, which wouldn't do any shell expansion anyway. If you're talking about unsafe text in HTML, we have things like HTML::Entities.

Basically, identify what you're going to be doing with the data, and then figure out how you're going to ensure that this untrusted data is safe.

And no matter how you approach it, don't think of your algorithm as being built to remove bad things. Build it to permit safe things. If this means doing a tr/a-zA-Z0-9_-//cd, then that's what you have to do.

Replies are listed 'Best First'.
Re: Re: Untainting safely. (b0iler proofing?)
by Jenda (Abbot) on Jun 25, 2002 at 20:14 UTC

    I think the last paragraph should be highlighted. Do not remove bad things. Permit safe things.

    A few weeks ago in a reply to someone in on similar topic I wrote:

    1. There is NO single list of dangerous characters. What characters are dangerous depends on the action you do with the data.
    2. If you or someone else creates a list of suspicious characters and test whether the data contain any of them, you are NOT safe. It's for sure you'll forget some character, it's for sure there is something you've never heard of that can go wrong.
    3. Always test whether the data DO CONTAIN ONLY ALLOWED characters. And allow only the characters you must.