|Perl: the Markov chain saw|
Security techniques every programmer should knowby Juerd (Abbot)
|on Dec 26, 2004 at 22:47 UTC||Need Help??|
These have all been mentioned numerous times, but many programmers still don't understand the risks of using power tools. Because I think everyone knows why it is important to program with security in mind, I'll just begin without any further introduction.
Know your environment
Perl is written in C and doesn't prevent you from shooting your own feet. This has some very dangerous implications that not every Perl programmer is aware of. Even though the language you use, you still need some basic knowledge of C to create secure programs.
The platform perl runs on is also important. Linux asks for different security measures than Win32. Even the filesystem that is used can be important: are FooBar and foobar the same file, or not? If you program for one specific platform, make it work only on that one.
Your program will be used by others
Or maybe not. But always assume the worst. If not because use by others is a probable future, then to keep yourself focussed on the important issues.
Security comes first
Your boss may tell you that the first priority is that everything works, but it is your job to tell him he's wrong. If something is wrong, make sure the program dies before more goes wrong. It's better to have a program that does nothing at all than to have a program that does everything that is expected from it and provides a backdoor for evil-doers as a free bonus feature.
Encode and escape!
Of every string that comes in, you should know the character set and, more importantly, its encoding. Before doing anything with the data, it's best to convert it to Perl's internal format.
For example, if your input is %-encoded UTF-8, use:
The reverse is also true. Before outputting a string, make sure it is in the right format. For example, to output the $string we just decoded in an utf-8 encoded HTML-document, we can use:
So even though input and output are utf-8, we still explicitly decode and encode it from and to Perl's internal format. If the output was part of a URL, we'd also unescape and then re-escape the data. This is to make sure no strange octet (byte) slips through. Another benefit is that in between, you have a string that normal Perl functions can manipulate without needing to have special facilities to handle a certain character set or encoding.
(Note: Perl's internal format happens to also be utf-8, but you should never assume this. Always explicitly decode and encode!)
If you don't escape properly, your program is prone to injection attacks. These include, but are not limited to:
Every output format requires its own escaping. Even better than escaping data, though, is preventing interpolation when possible by using placeholders (DBI) or a list variant of a function (system, exec, open). This skips escaping and unescaping by using a more direct mode of communication. If internally it is still implemented as escaping+unescaping (DBI::mysql), at least you know knowledgeable people take care of it.
Null bytes are scary
Several control characters are scary, because they often have special meaning in certain string formats, but the null byte is the most scary of all. In C, a null byte (\0) indicates where a string ends. However, in Perl, it's just a normal character. This has advantages and disadvantages. The disadvantages are more important to be aware of. Many of Perl's functions are implented using C functions, and in general, you can (and SHOULD!) assume they're not removing the null bytes for you.
Suppose you have written a CGI-script that does nothing more than display a page from the current directory. Storing data in the working directory is often a mistake in itself, but for this contrived example, let's ignore that.
You disallow anything that has a slash in it, and ".html" is used in the read_file call, so only .html-files from the current directory can be read, right?
Wrong. Just poisoning the data with a null byte is enough to evade the .html restriction. URI-encoded, a null byte is %00.
The underlying function is a C function. It thinks the string ends where the null byte is. So it opens page.cgi and ignores the "\0blah!.html" part. But wasn't File::Slurp a pure Perl module? Yes, it is. But it uses sysopen internally! Don't let the "sys" part fool you: open uses the same internal C function.
Instead reading through every module and Perl's source to find out what it uses, just remove all null bytes unless you have a good reason to keep them around. While you're at it, remove other control characters as well.
I skipped 0x0a and 0x0d because they are LF (line feed) and CR (carriage return), used for line endings. Depending on the application you write, you may need to exclude more characters, like vertical and horizontal tabs and form feeds.
A good way to make sure you test each string before using it externally is to use Perl's taint mode. It is invoked with -T. The previous example would only need one a small change.
Note that you should NEVER blindly use . or [^...] in your untaint regex. Whitelisting is much safer than blacklisting, and should have preference. For example ^(.+)\z and ^([^/.]+)\z still allow the dangerous null byte. I use \z instead of $, because $ allows \n (newline) just before the end. Know your tools, so learn to use regexes properly!
Please, add your own generic security related advice below. Preferrably with examples of how easy it is to get wrong. There is much more than I have just mentioned. If you know revelant PM nodes or external URLs, link to them. Let's have all the important information in one place.
But realise that knowing what you're doing, and thus reading documentation, is much better than reading only about the risks involved.
Considered by demerphq: "Section titles are too big"