http://www.perlmonks.org?node_id=488859


in reply to Re: Perl Best Practices book: is this one a best practice or a dodgy practice?
in thread Perl Best Practices book: is this one a best practice or a dodgy practice?

... the second suggested solution--using the IO::Insitu module--does use a back-up strategy to ensure that data is not lost if the program abends.

True. But it is still not re-runnable. Which makes it dangerous in the hands of naive users who interrupt a program with CTRL-C, then re-run it. If they do that, they may suffer permanent data loss and without being aware of it.

It seems to me that you can get re-runnability with little extra effort: simply write the temporary file first and only overwrite the original (via atomic rename) after the temporary has been successfully written.

As a test, I pressed CTRL-C midway through running this test program:

use strict; use warnings; use IO::InSitu; my $infile_name = 'fred.tmp'; my $outfile_name = $infile_name; my ($in, $out) = open_rw($infile_name, $outfile_name); for my $line (<$in>) { print {$out} transform($line); } # Try pressing CTRL-C while file is being updated. sub transform { sleep 1; return "hello:" . $_[0]; }
This is what I saw:
total 20 drwxrwxr-x 2 andrew andrew 4096 Sep 3 14:44 ./ -rw-rw-r-- 1 andrew andrew 0 Sep 3 14:42 fred.tmp -rw-rw-r-- 1 andrew andrew 191 Sep 3 14:42 fred.tmp.bak drwxrwxr-x 11 andrew andrew 4096 Sep 3 14:42 ../ -rw-rw-r-- 1 andrew andrew 288 Sep 3 14:41 tsitu1.pl
Now, of course, blindly re-running the test program resulted in permanent data loss (an empty fred.tmp file in this example).

Update: Just to clarify, this problem is broader than the naive user scenario given above and may bite you anytime a script is automatically rerun after an interruption -- a script that is run automatically at boot time, for example.

Further update: More detail on Win32 rename, related to tye's response below, can now be found at Re^7: Read in hostfile, modify, output.

Replies are listed 'Best First'.
Re^3: Perl Best Practices book: is this one a best practice or a dodgy practice?
by TheDamian (Vicar) on Sep 03, 2005 at 05:35 UTC
    Which makes it dangerous in the hands of naive users who interrupt a program with CTRL-C, then re-run it. If they do that, they may suffer permanent data loss and without being aware of it.
    To quote Oscar Wilde's Miss Prism: "What a lesson for him! I trust he will profit by it." ;-)
    It seems to me that you can get re-runnability with little extra effort: simply write the temporary file first and only overwrite the original (via atomic rename) after the temporary has been successfully written.
    The IO::Insitu module could certainly be reworked to operate that way. Except that then would fail to preserve the inode of the original file. :-(. Perhaps I will add an option to allow it to work whichever way (i.e. "inode-preserving" vs "rerunnable") the user prefers.

    Bear in mind though that an "atomic rename" isn't really atomic under most filesystems, so even this approach still isn't going to absolutely guarantee rerunnability.

      Bear in mind though that an "atomic rename" isn't really atomic under most filesystems

      rename is atomic on POSIX systems. Win32 has atomic rename and I just checked and rename uses it on modern Win32 operating systems. That qualifies as "most" of the Perl universe in my book (covering the two most common Perl environments, even if TheDamian chooses to call one of the top two "obscure"). Perhaps you have evidence to the contrary or perhaps you are thinking of pre-rename methods using link/unlink?

      - tye        

        My mistake. I hadn't realized we were talking about "atomic rename(1)", rather than more general renaming (such as link/unlink sequences). Sorry for the confusion.