Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatically delete temporary file cache #1354

Merged

Conversation

ianthomas23
Copy link
Collaborator

Fixes #1118.

This links the lifetime of a temporary file cache (obtained using "TMP" in fsspec.filesystem("filecache", cache_storage="TMP") to that of the cache filesystem instance. If a specific cache_storage location is specified then the behaviour is unchanged and the cache persists.

It is implemented using weakref.finalize on the CachingFileSystem instance.

To test this I am running the cache creation within a separate process to ensure that the filesystem is deleted at the end. I was surprised this was necessary as I assumed that del fs followed by gc.collect() would ensure the filesystem was deleted, but this does not seem to be the case. This might be a misunderstanding on my part of how it all works.

@martindurant
Copy link
Member

I was surprised this was necessary as I assumed that del fs followed by gc.collect() would ensure the filesystem was deleted

FS instances are cached in fsspec.spec._Cached. You can prevent this by using skip_instance_cache=True or instance.clear_instance_cache() . The config value weakref_instance_cache can also be used so that instances are not linked, but this is meant only for debugging. Note that implementation classes also have a class attribute cachable which will skip the cache if False.

@ianthomas23
Copy link
Collaborator Author

FS instances are cached in fsspec.spec._Cached. You can prevent this by using skip_instance_cache=True or instance.clear_instance_cache() . The config value weakref_instance_cache can also be used so that instances are not linked, but this is meant only for debugging. Note that implementation classes also have a class attribute cachable which will skip the cache if False.

Thanks, that explains it. I should have learnt about this by now!

It is probably a truism that even if you know about n ways in which things are cached in fsspec, it turns out there are really n+1 😄

I have simplified the test now.

@tasansal
Copy link

This is nice! Is there a way to do this with a specific cache directory, too?

@martindurant
Copy link
Member

Is there a way to do this with a specific cache directory, too?

I think we assume, that if you come with your own directory for cached files, you take responsibility for cleaning up.

@martindurant martindurant merged commit 302b7cc into fsspec:master Sep 14, 2023
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

File cache not cleared on program end
3 participants