-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix race conditions when extracting zip files #456
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. just a nitpick.
src/HealthGPS.Datastore/zip_file.cpp
Outdated
return cache_path; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (!std::filesystem::exists(cache_path)) {
extract_zip_file(file_path, cache_path);
}
return cache_path;
299601c
to
67be6c3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! LGTM.
I don't know if this is relevant but would std::filesystem::temp_directory_path
be useful here? Perhaps there's a reason we wouldn't want to use the system temp directory?
This way, we can recover if extraction is stopped halfway.
67be6c3
to
413d5fc
Compare
@TinyMarsh Ordinarily that would make sense, but the weird thing about what we're doing here is that we want to be able to move the temporary folder to its final destination atomically (i.e. so that another HPC job doesn't try to read the cache folder when it's only got half the files in) and moving a folder will only be atomic if it's in the same partition, which is why we do it in the same cache folder (e.g. on Linux |
Before extracting a zip file, we check whether we already have the contents of that file extracted to a cache folder and, if so, skip the extraction.
While this is safe to do if there is only one HealthGPS process running, if there are multiple processes attempting to extract the zip file at the same time, there is a potential race condition (#454):
Note that while this sounds unlikely, it is entirely possible that this could happen with multiple jobs running on the HPC (particularly given that filesystem access is often slow on the Imperial HPC).
Another possible bug is that a HealthGPS could terminate before extraction is complete, in which case, again, the cache folder will exist, but only contain a subset of the data files. In this case, the only remedy would be to delete the cache folder -- not particularly user friendly.
This PR solves both of these problems by performing extraction to a temporary folder (with a unique name) before renaming the folder. While moving a folder is not guaranteed to be atomic because it may involve copying data between devices, it seems that renaming them should be (or at least it is on POSIX, which will fix the HPC case).
Fixes #454.