|Problems? Is your data what you think it is?|
Good-bye Unix filter idiomby martin (Pilgrim)
|on Sep 06, 2012 at 19:12 UTC||Need Help??|
One of the perl idioms I have been using hundreds of times is the diamond operator in a while loop condition:
This idiom will make my program act like a Unix-style filter program. It will take its input from files named on the command line or, in the absence of arguments, standard input, and it will write to standard output. A dash as a filename also means standard input.
What I like about idioms like this is that they put boring stuff behind the scenes and leave the essence of my code, the parts that are specific to the task at hand, in the limelight.
This use of the diamond operator, however, is flawed. It makes programs inflexible and insecure.
More often than not, I need a filehandle for my input stream. For example, I may need to configure IO layers. The diamond operator provides a filehandle, ARGV, but only after having started to read from each file, which would be too late. The open pragma does not help me there either, as behind-the-scenes open is out of my scope and thus not affected by a pragma.
I can work around this by forcing the next file to be opened without consuming the first chunk of input, like this:
The eof builtin function with empty parentheses checks whether there is more input available for the diamond operator, which means it must open the next file and try to read from it, thereby updating the ARGV handle. This way I can sneak in a binmode command before the diamond operator itself can grab some of the file content. The eof without parentheses prevents me from doing this more than once for each file, as it checks whether the file currently being read has already been exhausted.
To summarize, there is a way to do such things, but the idiomatic conciseness will go out of the window.
I am concerned that my programs should only do things I intended them to. For example, a typical filter program should not run arbitrary code fed to it by the user.
That is why I avoid using open with two-argument syntax. Two-argument open will interpret its second argument, chopping whitespace from it, determining access modes from leading or trailing special characters, and even running external programs to set up an input or output pipeline. This can be hazardous.
The diamond operator, however, happily uses two-argument open to processes the contents of the @ARGV array, which by default contains the command line arguments supplied by the user. This means, that an argument "< foo" makes my program read a file named "foo", an argument "> foo" makes it clobber a file named "foo", and an argument "rm * |" makes it run a program that on my platform happens to remove every file in the current directory.
In order to squash this type of exploit, I need to sanitize @ARGV before I use the diamond operator. The CPAN module ARGV::readonly will try to do this for me, though not in a platform-independent way. Running perl in taint mode will prevent opening user-supplied filenames with access modes other than readonly, but on some platforms the only way to turn on taint mode is a command line flag supplied by the same user I am trying to guard myself against. This means I can't enforce it.
As an alternative, the CPAN module Iterator::Diamond looks promising. Unfortunately, it has some wrinkles wanting to get ironed out before it can act as a true replacement for the builtin construct (or indeed pass its own test suite under recent perl versions). This should be a matter of only some minor edits, though. In fact, I have just posted a patch to the module's bug tracker queue with a couple of suggestions.
With a working version of Iterator::Diamond, the old idiom could be replaced by this:
As there is now a constructor call to set things up, there is also a good place where future expansions could provide additional features such as an option to specify input layers for all files, say. I am looking forward to using this module or something similar.
Meanwhile, I'll use explicit three-argument open when I can afford it, and while (<STDIN>) when I am in a hurry.
Good-bye, diamond operator.