Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

Atomic use of unlink() and SQL DELETE? How?

by robins (Acolyte)
on Feb 09, 2006 at 11:48 UTC ( [id://529061]=perlquestion: print w/replies, xml ) Need Help??

robins has asked for the wisdom of the Perl Monks concerning the following question:

Okay, this is the scenario:

I have a database (DBD::Pg) with metadata about some files. The files themselves are stored in a folder with the primary key from the database as the filename.

I'm trying to find the correct way to delete the record in the database and then deleting the file from the filesystem (with full atomic exception handling). But be aware that a DELETE is cascading and will either block or cascade depending on the foreign key references.

If one of these commands fail I must make sure both the database and the file are rolled back to the previous state (that is, NOT deleted). But how can I simulate an unlink without actually deleting the file? And what to do if the storage filesystem (SAN) or database server goes haywire in the middle of the transaction and drops out?

The code below works at the moment, but if you setup a bunch of objects, and iterate through them, you would want this entire batch job to be one unit of work. As far as I can tell, this function isn't good enough in that scenario.

The PF->transaction(1) call starts a new transaction (and disables DBI AutoCommit). PF->transaction(0) tries to commit the current transaction, and issues a rollback if any errors happened. RaiseError is on, and PrintEror is off. The PF->sql() call is just a wrapper to call prepare and execute in one go with automatic statement caching. The entire call to this function is wrapped in eval, using a generic error handler which calls rollback if RaiseError is triggered.

If anyone has a completely different approach to this problem, please enlighten me with your wisdom.

Are there any CPAN modules available that can help in some way with this kind of problem?

This is the code I currently use:

sub new {} sub DESTROY {] etc. ... sub unlink { my $self=shift @_; my $object_id=shift @_; unless( defined($object_id) and $self->{'pf'}->is_uuid($object_id) + ) { return 0; # FAIL } # Find out if we're already in a transaction my $in_transaction=$self->{'pf'}->in_transaction; # Don't create t +ransaction if we're already in one # Start new transaction if we aren't already in one unless ($in_transaction) { $self->{'pf'}->transaction(1); } # Get object information my $sth=$self->{'pf'}->sql('SELECT object.object_id,storage_contai +ner.fs_path,object.storage_container_id,object.size FROM object,stora +ge_container WHER # If record is something else than 1, either multiple objects or n +o objects are available, either way, it's an error. unless ($sth->rows == 1) { die __("Object not available."); } # Get object metadata from database my $object=$sth->fetchrow_hashref; # Find out if object is used as header-object somewhere else my $check_header_object_sth=$self->{'pf'}->sql('SELECT container_i +d,title FROM container WHERE object_id=?',$object_id); # Fail if object is in use in a container. if ($check_header_object_sth->rows > 0) { (my $header_container_id,my $header_container_title)=$check_he +ader_object_sth->fetchrow_array; die __x("Object is used as a header object in container: {titl +e}", title=>$header_container_title); } # Remove object from container_object (child objects of container) my $delete_container_sth=$self->{'pf'}->sql('DELETE FROM container +_object WHERE object_id=?',$object->{'object_id'}); # Remove metadata file if present my $delete_sth=$self->{'pf'}->sql('DELETE FROM object WHERE object +_id=?',$object->{'object_id'}); # Return error if unable to delete object metadata from database unless ($delete_sth->rows == 1) { die __"Unable to find object metadata to delete."; } # Update storage_container with new used size my $container_sth=$self->{'pf'}->sql('UPDATE storage_container SET + usedsize=usedsize-? WHERE storage_container_id=?',$object->{'size'}, +$object->{'storag my $tmpfilename=$object->{'fs_path'} . "/" . $object->{'object_id' +}; # Check if storage file is owned by effective uid unless ( -r $tmpfilename) { die __"Unreadable object."; } # Remove file unless(unlink $tmpfilename) { die __"Unable to delete object."; } # Commit data and finish transaction unless ($in_transaction) { $self->{'pf'}->transaction(0); } return 1; # OK }

Replies are listed 'Best First'.
Re: Atomic use of unlink() and SQL DELETE? How?
by BrowserUk (Patriarch) on Feb 09, 2006 at 12:08 UTC
    But how can I simulate an unlink without actually deleting the file?

    Could you rename the file to be deleted, then perform the DB transaction and finally only delete the file once the DB transaction is completed? If the transaction fails or rolls back, you rename the file back.

    And what to do if the storage filesystem (SAN) or database server goes haywire in the middle of the transaction and drops out?

    If you get a major failure of this type, you would need some kind of integrity check that scans the DB looking to match the filenames and if one is not found, it looks for the renamed version and restores (renames) it to match the DB. This would be run as part of the DB startup/recovery procedure.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

      Even though the rename method seems useful, you're still not sure the file is deletable, or what? As you noticed, I'm using -r to determine if I can delete the file. Is this the correct method? Or should I be checking the directory instead? Seem to recall reading something somewhere that the ability to delete a file is determined by checking write access on the containing directory. Am I wrong?

      How would this rename trick work if I try to delete multiple objects by calling the function multiple times? If I get a rollback in the middle it can't really rollback the entire unit of work because some of the files are already deleted...

        1. Even though the rename method seems useful, you're still not sure the file is deletable, or what?

          The answer to this, and several of the other questions will depend on who's SAN you are using, which platform etc. However, as a general statement, if an application needs to delete files, then it should run under an account with that privilege; or the files should be created so that the account under which it runs will be able to.

          What I'm saying here is that with the exception of general purpose system utilities, applications shouldn't need to check these privileges. It is really an administrative problem to ensure that the appropriate privileges are accorded to the relevant userids/groups and it should be enough to ensure that the application(s) responsible for creating and manipulating these files run under the correct accounts and set the file attributes correctly.

        2. As you noticed, I'm using -r to determine if I can delete the file. Is this the correct method?

          That will depend upon the SAN you are using and the host OS you are running under, and will be different depending upon whether you are using a *nix-like OS, Win32 or some other variant.

        3. Or should I be checking the directory instead?

          See above.

        4. Seem to recall reading something somewhere that the ability to delete a file is determined by checking write access on the containing directory. Am I wrong?

          That sounds vaguely correct for *nix FSs, but I know little about them. The picture is rather more complex on Win32. And it could be different again depending upon which SAN you are using, regardless of which OS you are running on.

          You will need to enquire after the SAN docs, and probably experiment to find out the answers.

        5. How would this rename trick work if I try to delete multiple objects by calling the function multiple times?

          If I get a rollback in the middle it can't really rollback the entire unit of work because some of the files are already deleted...

          You would need to structure your application differently. Indeed, if you are doing transaction-based work, you will need to anyway.

          For the rename mechanism to work, you would need to accumulate the names of the (renamed) files in the code calling the function until you do a commit and only delete the files once you have confirmation of the commit.

          Another potential problem is other applications/instances going looking for the files (by their original names) after you've renamed them but prior to the commit, but this should be taken care of by your DB taking a write lock on the records to be deleted and holding it until the transaction is committed or rolled back. It is just important to do the rename after the delete of the associated record.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Atomic use of unlink() and SQL DELETE? How?
by tirwhan (Abbot) on Feb 09, 2006 at 14:22 UTC

    You could probably use hard links for this. Before deleting the file, create hard links to them from a temporary directory, then delete the originals. The files will continue to exist because of the hard link, but will no longer be accessible at their original location. Once all your database work is done you can just delete the temporary directory and all scheduled files will be gone, or link them back to the original place if the DB transaction failed.


    All dogma is stupid.
Re: Atomic use of unlink() and SQL DELETE? How?
by pboin (Deacon) on Feb 09, 2006 at 13:31 UTC

    If it were me, I'd not kill myself trying to do this perfectly. Getting the atomic transaction right, along with the cascading deletes, etc. seems to not be worth it.

    Have you considered relaxing your requirements a bit so that the delete will work correctly in all but the most extreme circumstances, and then just running a periodic maintenance job to make sure you're clean on files-to-pointers and pointers-to-files?

    Perfection is cool, but can be dear to implement.

      Do you have a good suggestion for the "almost perfect" solution?

      Could you point me in the right direction about what I'm doing wrong in the current function? I know I'm doing something wrong, but I think I've managed to blindfold myself while looking at the problem...

Re: Atomic use of unlink() and SQL DELETE? How?
by valdez (Monsignor) on Feb 10, 2006 at 12:17 UTC

    Why don't you first delete data from database and then from the filesystem? That's the way I implemented a similar unlink procedure; this way you use the rollback feature of your db and postpone the unlinking of the file until just after the commit. If the database operations fail, the file will not be touched; if the procedure can't unlink the file, the inconsistency could be repaired by a periodic sort of fsck. Am I wrong?

    Ciao, Valerio

      This seems like the most appropriate solution for my application. I'll just have to build the cleanup process into the backend scheduler. Thanks for the tip (and sorry about the long delay in answering).

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://529061]
Approved by willyyam
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others meditating upon the Monastery: (5)
As of 2024-03-19 09:33 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found