Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow using cached files if connection failure #47

Open
FabianHofmann opened this issue Apr 19, 2024 · 0 comments
Open

Allow using cached files if connection failure #47

FabianHofmann opened this issue Apr 19, 2024 · 0 comments

Comments

@FabianHofmann
Copy link

FabianHofmann commented Apr 19, 2024

Context

I'm experiencing an issue with Snakemake where workflows fail when there is no internet connectivity, even though the required files are already cached locally. The storage function with keep_local=True is supposed to cache files locally, but Snakemake still tries to verify the presence of these files online before deciding to use the cached version. This behavior is problematic in environments with intermittent internet access or when working offline.

A feature or an argument enhancement in the storage function would be helpful that allows Snakemake to automatically use cached files if they are present, without attempting to check the remote file source. This would be particularly useful for ensuring that workflows are robust to network outages and do not require internet access if the necessary data is already cached.

For instance, an argument like use_cache_if_available=True could be added to the storage function, which would make Snakemake check the local cache first and proceed if the file is available, only falling back to a remote check if it is not.

Possible workaround

As a workaround, I have considered manually writing custom logic to handle file checks and determine whether to use a local copy or download a new one. However, this approach requires additional boilerplate code and deviates from Snakemake’s streamlined workflow management.

The use of local caches without remote checks would greatly improve the efficiency and reliability of data-driven workflows, especially in computational environments with limited or unreliable internet access.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant