Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

Re: Re: Re: Super Find critic needed

by BrowserUk (Pope)
on Jun 30, 2003 at 14:54 UTC ( #270181=note: print w/ replies, xml ) Need Help??


in reply to Re: Re: Super Find critic needed
in thread Super Find critic needed

By "server crash", I meant the machine (server or workstation) where the code is running, stops because of hardware failure, or power failure, or you knock the off-switch, or even because an administrator accidently kills your process while cleaning up zombies at 4 am. Ie. Events that you cannot detect from within your script.

The basic mechanism to avoid this is to make a backup of the original before overwirting it with the modified version. The are several different sequences of copying, renaming, deleting and overwriting that you can use. Some of these are "better" than others, but I've yet to see one that completly eliminates the risks, though they reduce the window for failure to the point of reasonable risk.

For a couple of neat way to use $^I (see Perlvar and perlrun -i) to get perl to backup your files for you, see My pattern for "in-place edit" for many files and Put your inplace-edit backup files into a subdir, both from Merlyn.

Perhaps the best way to be safe, is to make a copy of the directory structure, run your script against that, and then copy the directory structure over the original when your sure it has been successful. Perhaps you are already doing (or intending to do) this, in which case you can ignore this advise.

The other thing I noticed in your script is that every file will be over written regardless of whether any actual changes were made or not. This is likely to give you problems when you come to verify that the changes made where correct, or worse, make it hard to undo any mistakes as you won't know whether 1 file or every file was changed.

There are many, many ways of writing your script, and many different philosophies on the best way to do it. Perhaps the best advice I can give you, is to sit down with your code, mentally or physically on paper, work through each step of the process and imagine what state your files will be left in if a powercut occurs at each step. Decide how much of a risk that presents in your system, and how much effort you should expend to prevent it from happening.


Examine what is said, not who speaks.
"Efficiency is intelligent laziness." -David Dunham
"When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller



Comment on Re: Re: Re: Super Find critic needed
Re: Re: Re: Re: Super Find critic needed
by Anonymous Monk on Jun 30, 2003 at 16:04 UTC
    Yes, I am copying the entire directory and its contents before running the script.

    Also reference what you said:
    "The other thing I noticed in your script is that every file will be over written regardless of whether any actual changes were made or not. This is likely to give you problems when you come to verify that the changes made where correct, or worse, make it hard to undo any mistakes as you won't know whether 1 file or every file was changed"

    How would I change it so it just writes on files I am changing??

      Okay. Here is an untested modification of the second part of your script that reads the file into an array, applies the regexes to the array and notes if any changes were made. If there were changes, creates an new file and writes the changed content to it, and finally renames the new file over the old file.

      Files who do contains anything that needs modifying will be untouched and retain theor original modification timestamps, and the window for errors resulting from system failures is much reduced, though not completely eliminated.

      foreach my $file (@files) { open( FILE, '<', $file ) or die "Couldn't open $file: $!"; @data = <FILE>; close( FILE ); my $modified = 0; # Assume no modification foreach (@data) { # Increment the flag if changes are made ++$modified if s/servername\.aa\.company\.zzzz\.com/NEWNAME.\c +om/gi; ++$modified if s/\bservername\.aa\.company\.com\b/NEWNAME\.com +/gi; ++$modified if s/\bservername\b/NEWNAME\.com/gi; } # If we made no modifications, # leave the original file as is if( $modified ) { # Create a new file for the modified data open( FILE, '>', "$file.new" ) or die "Couldn't create $file.n +ew: $!"; print FILE for @data; close(FILE) # Then rename the new file over the old file, # effectively deleting the old # There is still a window of opportunity for error # if the system crashes, but it is much smaller. rename "$file.new", $file; } } print "Total Count = $ct\n";

      Examine what is said, not who speaks.
      "Efficiency is intelligent laziness." -David Dunham
      "When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller


        Thanks I will now give it a try.
Re4: Super Find critic needed
by bbfu (Curate) on Jun 30, 2003 at 16:51 UTC

    Some of these are "better" than others, but I've yet to see one that completly eliminates the risks, though they reduce the window for failure to the point of reasonable risk.

    I would (perhaps naively) think that renaming the original file (renaming should be atomic, no?) to something like "$filename.$$", then reading / munging / writing to "$filename", and only deleting "$filename.$$" when the new filehandle is closed (and thus their buffers flushed as well as Perl can make them) would completely eliminate the risk. The process could stop at any point and, at worst, you'd have a partially munged new file and the original file both existant. Assuming, of course, that you have a sufficiently paranoid filesystem.

    I'm not entirely sure I'm not missing something, so please enlighten me if I am. =)

    bbfu
    Black flowers blossom
    Fearless on my breath

      If the process is interupted after the new file has been created, but before the old file has been deleted, regardless of whether the new file was properly written and closed, when the system is restored and the script is re-run, the program will again find a file by the original name, and rename it to "$filename.$$". If the new file was was completely written and properly flushed, then no harm done, it will just be processed as though it was the original file, no further changes will be made and your back on track.

      However, if the new file was only partially written when the interuption occured, then without adding an explicit check for the existance of a file called "filename.$$", then perl's rename function will silently blow the first backup away, over writing it with the partially incomplete version.

      This implies that perl rename is implemented as either a copy or a delete followed by rename, as the OS rename (whether the command or the underlying system call), will not allow you to rename a file if a file with the new name already exists. At least this is the case under Win32, I'm not sure of the situation with other OS's.

      It therefore falls to the programmer using Perl's rename to check for and handle the situation where the new name already exists using -e or similar. Once this check is in place, then you still need to add code to handle the situation where the backup does exist and arrange to delete the (potentially partial) new file created at the last pass and restore the backup. Perl's rename will do this ostensibly in one step, but as I just noted, in reality, at least on some systems, there are two steps involved. A delete, followed by a rename. If a second interuption occures between these two steps, then you get the situation where you have a backup with no original. If the Find::File or globing processes used to build the file list uses anything other than a fully wild match criteria, then a third pass won't even see the backup as it will only be looking for the original, which no longer exists. So whilst no data has been lost, it will require a manual intervention to restore it.

      Yes. This is a paranoid view. To arrive here we need three failures to occur at exactly inopportune moments. However, I was involved in a project where the whole issue of automating the updating files in a production environment became the subject of a protracted investigation to determine a mechanism for ensuring that there were NO risks involved. The machines in question were used by cargo division of a large international airline to control the loading of freight on their fleet of 747 cargo aircraft. Accurate information of what freight had been loaded on the aircraft is paramount as the weight of the cargo and its distribution are critical information to how much fuel is required and to the handling and take-off characteristics of the aircraft when taking off from airports at high altitudes and/or hot conditions. To complicate matters, some of the servers in question were located in tin shacks on African and Russian airfields that were little more than dust strips, and with mains systems that were subjected to frequent power cuts that often lasted longer than the UPS's could maintain.

      That was done using REXX not Perl, but most of the same problems arise. The final conclusions of the investigation was that there is no 100% reliable way to completly automate the process. It can be reduced to a margin of a very low probablity of occurance, but the only way to get to 100% is to have a manual verification as the final step of the process and only accept that the process has been completed in its entirity if that verifcation runs from begining to end without interuption.

      In most real-life situations, 99% is probably good enough:)


      Examine what is said, not who speaks.
      "Efficiency is intelligent laziness." -David Dunham
      "When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller


        ...A third pass won't even see the backup as it will only be looking for the original, which no longer exists. So whilst no data has been lost, it will require a manual intervention to restore it.

        Why not simply have the script check or the existance of any backups (before renaming the "original") and assume the worst (ie, even if there is an original, it must be corrupt).

        It seems to me that this would eliminate the need for manual intervention without adding any risk. After all, that's exactly what the person intervening will do anyway, is it not? Then, you could have any number of power failures, all at exactly the wrong times, and the worst that will happen is the program will completely reprocess the file each time power is restored. No?

        bbfu
        Black flowers blossom
        Fearless on my breath

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://270181]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others about the Monastery: (5)
As of 2014-09-20 23:40 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    How do you remember the number of days in each month?











    Results (164 votes), past polls