(Ok. Rewriting this post, now that I know "more".)
The OS is (currently) a somewhat older version of Fedora, but we are in the process of getting up to current. I am told the NFS version is 3.
I am not as clear as I want to be on how this problem manifests itself. Since I am only hearing about it through another engineer (my boss).
I thought originally, the failure happens because the requested process on each server is essentially simultaneous--in this case taken to mean faster than the lock can be detected by either machine. Is this possible?
However, now the original engineer is saying that it might be because the second server is actually just waiting until the lock is free, and then running the same process blindly. The lock type is "LOCK_EX" (BLOCKING). But that makes no sense, because if that's the issue, couldn't he just make a NONBLOCKING lock and quit if got back undef? So, now I have to dig further.
I'll ask him to get his story straight, and/or I'll have to find a way for us to test this.
|