Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Race condition in Linux::Inotify2 between dir and new file?

by Your Mother (Archbishop)
on Jun 24, 2009 at 22:06 UTC ( [id://774531]=perlquestion: print w/replies, xml ) Need Help??

Your Mother has asked for the wisdom of the Perl Monks concerning the following question:

I'm getting a race condition in some Linux::Inotify2 code. Where I'm watching a directory for IN_CREATE events to process the created files. The following works perfectly 99% of the time.

use strict; use warnings; use XML::LibXML; use Linux::Inotify2; my $drop_dir = "/foo/bar"; my $inotify = Linux::Inotify2->new or die "unable to create new inotify object: $!"; $inotify->watch($drop_dir, IN_CREATE, sub { my $e = shift; my $name = $e->fullname; process_drop_file($name); }); 1 while $inotify->poll; sub process_drop_file { my $file = Path::Class::File->new(shift); -f $file or croak "'$file' is not a file"; my $doc = XML::LibXML->new->parse_file("$file"); # Do stuff with it... $file->remove or croak "Couldn't remove '$file'"; }

But then bam-

/foo/bar/test.xml:1: parser error : Start tag expected, '<' not found

The file is there and in tact by the time I can manually check of course but XML::LibXML tried to read it too early. This means it passed the -f but had no content, I think.

Among several other dead-ends, like playing with the cookie and watch args, I tried this-

$inotify->watch($drop_dir, IN_CREATE, sub { my $e = shift; my $name = $e->fullname; $inotify->watch($name, IN_CLOSE, sub { process_drop_file($name); }); });

-in a misguided attempt to move the event check to the file, but of course this could only succeed in the 1 in 100 case where the race condition hits.

What am I missing or doing wrong? Thank you!

Replies are listed 'Best First'.
Re: Race condition in Linux::Inotify2 between dir and new file?
by kennethk (Abbot) on Jun 24, 2009 at 22:16 UTC
    Given that your issue is XML::LibXML is trying to read an empty file, perhaps you could use -s to test for non-zero size, and institute a waiting loop until it passes that condition? Something like:

    sub process_drop_file { my $file = Path::Class::File->new(shift); -f $file or croak "'$file' is not a file"; sleep(1) until (-s $file); # <-- New line my $doc = XML::LibXML->new->parse_file("$file"); # Do stuff with it... $file->remove or croak "Couldn't remove '$file'"; }

      I am still hoping there is an event or setting solution but if not the -s check is probably a good approach and I may do just that (with a max_check saftey valve or something in case a real 0 size file comes through). Thanks.

      what if in that second those seconds that you're sleeping waiting for the size of the file to grow, some other files are created in the directory watched ?
      will Iotify2 call process_drop_file for those also ?
      maybe starting 2 threads , one which puts them in a queue and the other which reads them from a queue would be more appropriate to not skip(unintentionally) any of them
Re: Race condition in Linux::Inotify2 between dir and new file?
by Crackers2 (Parson) on Jun 25, 2009 at 03:23 UTC

    Wouldn't you be better off using the IN_CLOSE_WRITE and IN_MOVED_TO events?

    IN_CREATE sounds to me like it fires when the file gets created, i.e. at open time, while it seems you're more interested in knowing when the file contents are fully there.

    IN_CLOSE_WRITE sounds like it would take care of this for you for new files that get created in the watched directory (which I assume also covers a copy), while IN_MOVED_TO would take care of existing files that get added to the directory (either through move or link)

    (Big disclaimer: I haven't used inotify myself, I'm just speculating based on gut instinct and the docs)

Re: Race condition in Linux::Inotify2 between dir and new file? (cp then mv)
by tye (Sage) on Jun 25, 2009 at 06:59 UTC

    Best practice for putting stuff into such monitored directories is to put the file into the directory with a known-temporary name (using a convention such as appending ".tmp" to the file name) and then to rename the file (to remove the ".tmp" suffix) next. Renaming a file via rename is an atomic file-system operation so if you see a file w/o a *.tmp name, there is no race condition involved if you then try to open it and read it (the file has already been completely written).

    But, yes, using IN_CLOSE_WRITE and IN_MOVED_TO make a lot more sense. I didn't find the description of IN_CLOSE_WRITE clear in the module documentation but "(a watched file or) a file within a watched directory was closed, after being opened in writeable mode (this does not necessarily imply the file was written to)" seems clear and what you want.

    - tye        

      Thanks much (and to Crackers2). IN_CLOSE_WRITE moves the problem around a little. It triggers the -f $file or croak sometimes. IN_MOVED_TO doesn't fire for this particular operation. I like the .tmp idea. I searched for "atomic" and "inotify" but didn't consider looking for atomic Perl stuff. I'll give it a spin.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://774531]
Approved by kennethk
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others learning in the Monastery: (6)
As of 2024-04-18 13:19 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found