I'm very much enjoying reading
Perl Best Practices.
I find myself scribbling notes after almost every page;
mainly concrete ideas for improving the quality and especially
the maintainability of production Perl code at work.
To me, this is the most important Perl book to be published for years,
because it helps me sell Perl as a maintainable language to management.
However, the In-situ Arguments ("Allow the same filename to be
specified for both input and output") practice described on page 304
in chapter 14 has me scratching my head, for it seems to me to be
more a "dangerous practice" than a "best practice".
Here is a test program, derived from the example given in the book:
# The idea is to use the Unix unlink trick to write the
# destination file without clobbering the source file
# (in the case where the source and destination are the same file).
use strict;
use warnings;
my $source_file = 'fred.tmp';
my $destination_file = $source_file;
# Open both filehandles...
use Fatal qw( open );
open my $src, '<', $source_file;
unlink $destination_file;
open my $dest, '>', $destination_file;
# Read, process, and output data, line-by-line...
while (my $line = <$src>) {
print {$dest} transform($line);
}
# This is my test version of the transform() function;
# the sleep is there for convenience in testing what happens
# if you interrupt proceedings mid stream by pressing CTRL-C.
sub transform {
sleep 1;
return "hello:" . $_[0];
}
My problems with this code are:
- Consider what happens if the while loop is interrupted: by power failure, by user pressing CTRL-C, or because the print fails (due to disk full or disk quota exceeded, say). You've just corrupted your file. You've probably lost data. Worse, you don't know you've done it. And when you go to re-run the script after the interruption, you may spend a lot of time trying to figure out what's happened to your data ... That is, this idiom is not "re-runnable".
- The unlink trick used to avoid clobbering the input file works on Unix, but may not work on other operating systems. In particular, when run on Windows, the above program clobbers the source file. That is, this idiom is not portable.
As discussed in Re-runnably editing a file in place, it seems sounder to first write a temporary file.
Once you're sure the temporary file has been written without error
(and after the permissions on the temporary are updated to match the original)
you then (atomically) rename the temporary file to the original.
In that way, if writing the new file is interrupted for any reason,
you can simply re-run the program without losing any data.
Please let me know what I've overlooked.
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.