-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid monolithic pipeline state file #45
Comments
I suggest storing the hash digest of the module and all its dependencies and serializing the validation output, if needed, in the cache file and directory names, as already done for the configuration hash digest. The devalidation (should we say invalidation?) based on the module hash digest would be implicit, like the configuration check.
To manage the case of simultaneous runs with devalidated stages overlapping, we could generate a temporary cache file during execution, and check before each stage execution if the corresponding cache file exists (and its parent caches are older) to avoid re-running a stage that a simultaneous process has just ended. What do you think? |
Yes, sounds good :) |
I forgot about the "info". They could have a separate file for each step in the "cache" folder. I try a PoC. |
Currently, the
pipeline.json
file containing the state of all stages is one big monolithic file. This comes with problems, for instance when one wants to run the pipeline multiple times in parallel, for instance with different random seeds. This can lead to race conditions in which thepipeline.json
is updated by one process, and read by the other one, etc.Ideally, meta information about stages could be distributed in the relevant folders of the stages.
The text was updated successfully, but these errors were encountered: