Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disable all caching #1206

Closed
banesullivan opened this issue Jun 14, 2023 · 7 comments · Fixed by #1296
Closed

Disable all caching #1206

banesullivan opened this issue Jun 14, 2023 · 7 comments · Fixed by #1296

Comments

@banesullivan
Copy link
Contributor

banesullivan commented Jun 14, 2023

There are use cases for local computation where caching isn't desired. Caching is excellent for stateless environments where multiple users may be requesting tiles/thumbnails of the same image repeatedly or even in local environments when dealing with non-pre-tiled datasets. However, caching introduces unexpected results and stale data in local, stateful environments (like me as a single user in a Jupyter Notebook).

For example, I want to experiment with an algorithm to interactively to produce a new raster and call it "ndvi.tif". I want to toy with the algorithm: tinker with the computation, re-run, overwriting the existing data, and visualize the result with large_image to see how my changes to the algorithm affected the result.

This scenario wouldn't be a problem if I set up some temporary file mechanism to save a new temp file for every computation, and in effect have large_image open a new file on each iteration. But I don't want to do this. I don't want to save tons of new files. I want one working file.

The trouble is, large_image caches the tile source, and this workflow isn't possible.

To simplify the testing of this, I have two versions of the same image, ndvi-09.tif and ndvi-11.tif, which I will save as ndvi.tif and attempt to reload with large_image

ndvi-09.tif ndvi-11.tif
download download

The solution given in #985 isn't sufficient as there is still some caching going on that prevents large-image from re-opening the same path as a new tile source

import large_image
large_image.config.setConfig("cache_tilesource_maximum", 1)
large_image.config.setConfig("cache_python_memory_portion", 1_000_000_000)
!rm ndvi.tif
!cp ndvi-09.tif ndvi.tif

large_image.open('ndvi.tif')

download

!rm ndvi.tif
!cp ndvi-11.tif ndvi.tif

large_image.open('ndvi.tif')

download

Uh oh! That thumbnail above isn't right! It's using the cached tile source even though restricted the cache constraints.

@banesullivan
Copy link
Contributor Author

banesullivan commented Jun 14, 2023

I'm wondering if it would be easiest to implement a "dummy cache" that always misses that users can opt in to

@manthey
Copy link
Member

manthey commented Jun 14, 2023

Here is a hacky way to do it:
Immediately after importing large_image, do large_image.cache_util.cache.LruCacheMetaclass.__call__ = lambda x, *a, **b: large_image.cache_util.cachesClear() or type.__call__(x, *a, **b)
This disables the tile source cache AND clears the tile cache whenever a new tile source is created. We probably don't want to get rid of the tile cache entirely (but I could be wrong), as then asking for a thumbnail and then tiles would be less performant. But, since reopening a file that is now different still results in the same cache keys for tiles, we need to either have a cache keys that are more dependent on file properties (or maybe use a uuid4 for the source) or flush the tile cache for that file; we don't expose any way to flush the tile cache on a per-file basis, so flushing the entire tile cache is a balance.

@manthey
Copy link
Member

manthey commented Jun 14, 2023

A better hack is probably:
large_image.cache_util.cache.LruCacheMetaclass.__setitem__ = lambda *a, **b: large_image.cache_util.cachesClear() as it doesn't break pickling sources

@manthey
Copy link
Member

manthey commented Jun 14, 2023

A better hack is probably: large_image.cache_util.cache.LruCacheMetaclass.__setitem__ = lambda *a, **b: large_image.cache_util.cachesClear() as it doesn't break pickling sources

Actually, this won't work right -- we'd have to override the __setitem__ of the cache, not the cache metaclass.

@banesullivan
Copy link
Contributor Author

The first approach fixes the problem for me!

@manthey
Copy link
Member

manthey commented Jun 14, 2023

It will break pickling the tile source, so we still might want to define a config value to do this more correctly.

manthey added a commit that referenced this issue Sep 13, 2023
When opening a tile source, pass `noCache=True`.  In this mode, the tile
source can directly have its style modified (e.g., `source.style = <new
value>`).  This is also used when importing images into girder to avoid
flushing the cache of tile sources that are in active use.

This closes #1294.

This closes #1145.

There is a config value `cache_sources`, that, if False, makes `noCache`
default to False.

This closes #1206.
@manthey
Copy link
Member

manthey commented Sep 13, 2023

When #1296 is merged, this can be accomplished with a config setting. This will NOT break pickling the tile source.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants