You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The file_exists() method relies on stat() to determine whether or not a file exists. Since stat() is subject to the NFS attribute cache, this information can easily be a full minute out of date. A more appropriate file existence test might be a call to open() followed by an immediate close(). This is not cached. Notably, the open()/close() sequence also flushes the NFS cache, making a subsequent call to access() (fs::file_access()) or stat() (fs::file_info()) yield fresh results.
I would propose to change file_exists() to the open/close pattern, as here caching is both unexpected (other implementations use open/close to detect file presence) and not usually performance relevant. At least, I can't think of many scenarios where a API user would check for the existence of a million files. With file_access() and file_info() the situation is different, it is more obvious that this is meta data that may get cached (on NFS specifically) and these are more likely to be called on large input vectors or many times. The rest can be docs, and file_exists can also double as cache flush. Also, this is much less code change than an extra flag to all the functions and a dedicated cache flush function.
(Happy to do PR to this effect if desired)
The text was updated successfully, but these errors were encountered:
The
file_exists()
method relies onstat()
to determine whether or not a file exists. Sincestat()
is subject to the NFS attribute cache, this information can easily be a full minute out of date. A more appropriate file existence test might be a call toopen()
followed by an immediateclose()
. This is not cached. Notably, theopen()
/close()
sequence also flushes the NFS cache, making a subsequent call toaccess()
(fs::file_access()
) orstat()
(fs::file_info()
) yield fresh results.I would propose to change
file_exists()
to the open/close pattern, as here caching is both unexpected (other implementations use open/close to detect file presence) and not usually performance relevant. At least, I can't think of many scenarios where a API user would check for the existence of a million files. Withfile_access()
andfile_info()
the situation is different, it is more obvious that this is meta data that may get cached (on NFS specifically) and these are more likely to be called on large input vectors or many times. The rest can be docs, andfile_exists
can also double as cache flush. Also, this is much less code change than an extra flag to all the functions and a dedicated cache flush function.(Happy to do PR to this effect if desired)
The text was updated successfully, but these errors were encountered: