“Stale file handle” error, when process trying read the file, that other process already had deleted

Question

I'm writing stress test suite for testing distributed file systems over NFS. In some cases when some process deletes file, while some other process attempts to read from it, I'm getting "Stale file handle" error (116). Is that kind of error is expected and acceptable in such race condition? Test working as follows: Starting x number of client machines Each

Accepted Answer

This is totally expected.  The NFS specification is clear about use of file handles after an object (be it file or directory) has been deleted.  Section 4 clearly addresses this.  For example:The persistent filehandle will become stale or invalid when the file system object is removed.  When the server is presented with a persistent filehandle that refers to a deleted object, it MUST return an error of NFS4ERR_STALE.This is such a common problem, it even has its own entry in section A.10 of the NFS FAQ, which says one common cause of ESTALE errors is that:The file handle refers to a deleted file. After a file is deleted on the server, clients don&#8217;t find out until they try to access the file with a file handle they had cached from a previous LOOKUP. Using rsync or mv to replace a file while it is in use on another client is a common scenario that results in an ESTALE error.The expected resolution is that your client app must close and reopen the file to see what has happened.  Or, as the FAQ says:&#8230; to recover from an ESTALE error, an application must close the file or directory where the error occurred, and reopen it so the NFS client can resolve the pathname again and retrieve the new file handle.

Advertisement

Answer