Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

Re: Perl Best Practices book: is this one a best practice or a dodgy practice?

by TheDamian (Vicar)
on Sep 03, 2005 at 02:54 UTC ( [id://488837]=note: print w/replies, xml ) Need Help??


in reply to Perl Best Practices book: is this one a best practice or a dodgy practice?

Please let me know what I've overlooked.
Well, if the suggested idiom doesn't work under Windows, I'd argue that either Windows (or possibly Perl) is broken in that regard. But, of course, that doesn't solve your problem. I guess it's just as well that Chapter 18 suggests testing everything before you deploy it! ;-)

Although the recommendation in question doesn't claim to solve the problem of rerunnability, as it happens, the second suggested solution--using the IO::Insitu module--does use a back-up strategy to ensure that data is not lost if the program abends.

Replies are listed 'Best First'.
Re^2: Perl Best Practices book: is this one a best practice or a dodgy practice?
by eyepopslikeamosquito (Archbishop) on Sep 03, 2005 at 05:09 UTC

    ... the second suggested solution--using the IO::Insitu module--does use a back-up strategy to ensure that data is not lost if the program abends.

    True. But it is still not re-runnable. Which makes it dangerous in the hands of naive users who interrupt a program with CTRL-C, then re-run it. If they do that, they may suffer permanent data loss and without being aware of it.

    It seems to me that you can get re-runnability with little extra effort: simply write the temporary file first and only overwrite the original (via atomic rename) after the temporary has been successfully written.

    As a test, I pressed CTRL-C midway through running this test program:

    use strict; use warnings; use IO::InSitu; my $infile_name = 'fred.tmp'; my $outfile_name = $infile_name; my ($in, $out) = open_rw($infile_name, $outfile_name); for my $line (<$in>) { print {$out} transform($line); } # Try pressing CTRL-C while file is being updated. sub transform { sleep 1; return "hello:" . $_[0]; }
    This is what I saw:
    total 20 drwxrwxr-x 2 andrew andrew 4096 Sep 3 14:44 ./ -rw-rw-r-- 1 andrew andrew 0 Sep 3 14:42 fred.tmp -rw-rw-r-- 1 andrew andrew 191 Sep 3 14:42 fred.tmp.bak drwxrwxr-x 11 andrew andrew 4096 Sep 3 14:42 ../ -rw-rw-r-- 1 andrew andrew 288 Sep 3 14:41 tsitu1.pl
    Now, of course, blindly re-running the test program resulted in permanent data loss (an empty fred.tmp file in this example).

    Update: Just to clarify, this problem is broader than the naive user scenario given above and may bite you anytime a script is automatically rerun after an interruption -- a script that is run automatically at boot time, for example.

    Further update: More detail on Win32 rename, related to tye's response below, can now be found at Re^7: Read in hostfile, modify, output.

      Which makes it dangerous in the hands of naive users who interrupt a program with CTRL-C, then re-run it. If they do that, they may suffer permanent data loss and without being aware of it.
      To quote Oscar Wilde's Miss Prism: "What a lesson for him! I trust he will profit by it." ;-)
      It seems to me that you can get re-runnability with little extra effort: simply write the temporary file first and only overwrite the original (via atomic rename) after the temporary has been successfully written.
      The IO::Insitu module could certainly be reworked to operate that way. Except that then would fail to preserve the inode of the original file. :-(. Perhaps I will add an option to allow it to work whichever way (i.e. "inode-preserving" vs "rerunnable") the user prefers.

      Bear in mind though that an "atomic rename" isn't really atomic under most filesystems, so even this approach still isn't going to absolutely guarantee rerunnability.

        Bear in mind though that an "atomic rename" isn't really atomic under most filesystems

        rename is atomic on POSIX systems. Win32 has atomic rename and I just checked and rename uses it on modern Win32 operating systems. That qualifies as "most" of the Perl universe in my book (covering the two most common Perl environments, even if TheDamian chooses to call one of the top two "obscure"). Perhaps you have evidence to the contrary or perhaps you are thinking of pre-rename methods using link/unlink?

        - tye        

Re^2: Perl Best Practices book: is this one a best practice or a dodgy practice?
by eyepopslikeamosquito (Archbishop) on Sep 03, 2005 at 06:47 UTC

    Well, if the suggested idiom doesn't work under Windows, I'd argue that either Windows (or possibly Perl) is broken in that regard.

    Good luck getting Bill Gates or p5p to acknowledge that their unlink is broken. ;-) The Perl documentation for the unlink function is annoyingly vague: "Deletes a list of files. Returns the number of files successfully deleted.".

    The POSIX unlink semantics are clearer, if difficult to implement on non Unix systems, as noted in djgpp POSIX unlink specification:

    The POSIX specification requires this removal to be delayed until the file is no longer open. Due to problems with the underlying operating systems, this implementation of unlink does not fully comply with the specs; if the file you want to unlink is open, you're asking for trouble -- how much trouble depends on the underlying OS.

    I might add that the ANSI C remove function does not appear to mandate POSIX semantics: calling remove on an open file on Linux works (a la POSIX unlink), while on Windows it fails.

    BTW, I found a related discussion of atomic file update in the "Atomic in-place edit" section in This Week on perl5-porters (18-24 November 2002). Not sure if anything was resolved, however. (Update: apart from MJD submitting a Bug report for the Universe :-).

Re^2: Perl Best Practices book: is this one a best practice or a dodgy practice?
by pg (Canon) on Sep 03, 2005 at 03:56 UTC
    "guess it's just as well that Chapter 18 suggests testing everything before you deploy it! ;-)"

    It is kind of far-fecthed to talk about testing here. The best practice is to find "bugs" as earlier as possible, the later they are found/fixed, the higher the cost is.

    The particular bad practice that we are discussing in this case, can be easily spotted as early as specification review (if the process requires one to document such details), or at least during peer code review. If a simple issue like this can actually pass all the guarding processes and go as far as testing, the processes should be reviewed.

    Update: Read TheDamian's reply, now I have seen that big ;-) Okay, my fault, please ignore my first sentence in this post.

      It is kind of far-fetched to talk about testing here.
      Hmmmmm. Maybe I didn't make the smiley big enough. Let's try again:
      guess it's just as well that Chapter 18 suggests testing everything before you deploy it! ;-)
      The point I was trying to make is not that you should rely on the testing to catch this problem, but that, if you don't catch the problem earlier, the testing phase should still do so.

      In other words, (just exactly as you say) best practices ought to be an integrated process, so even if one practice introduces a problem, another practice should catch it. Which is precisely what my book advocates.

      One hopes that programmers do at least some testing _before_ code review.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://488837]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others avoiding work at the Monastery: (4)
As of 2024-03-29 06:03 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found