diff --git a/docs/getting-started/first-steps.md b/docs/getting-started/first-steps.md
index dcbe54d24..403724362 100644
--- a/docs/getting-started/first-steps.md
+++ b/docs/getting-started/first-steps.md
@@ -34,15 +34,15 @@ by browsing our worked-out examples illustrating pyDVL's capabilities either:
   have to install jupyter first manually since it's not a dependency of the
   library.
 
-# Advanced usage
+## Advanced usage
 
 Besides the dos and don'ts of data valuation itself, which are the subject of
 the examples and the documentation of each method, there are two main things to
 keep in mind when using pyDVL.
 
-## Caching
+### Caching
 
-PyDVL can cache the computation of the utility function
+PyDVL can cache (memoize) the computation of the utility function
 and speed up some computations for data valuation.
 It is however disabled by default.
 When it is enabled it takes into account the data indices passed as argument
@@ -58,7 +58,7 @@ the same utility function computation, is very low. However, it can be very
 useful when comparing methods that use the same utility function, or when
 running multiple experiments with the same data.
 
-pyDVL supports different caching backends:
+pyDVL supports 3 different caching backends:
 
 - [InMemoryCacheBackend][pydvl.utils.caching.memory.InMemoryCacheBackend]:
   an in-memory cache backend that uses a dictionary to store and retrieve
@@ -85,7 +85,7 @@ pyDVL supports different caching backends:
     Continue reading about the cache in the documentation
     for the [caching package][pydvl.utils.caching].
 
-### Setting up the Memcached cache
+#### Setting up the Memcached cache
 
 [Memcached](https://memcached.org/) is an in-memory key-value store accessible
 over the network. pyDVL can use it to cache the computation of the utility function
@@ -108,7 +108,7 @@ To run memcached inside a container in daemon mode instead, use:
 docker container run -d --rm -p 11211:11211 memcached:latest
 ```
 
-## Parallelization
+### Parallelization
 
 pyDVL uses [joblib](https://joblib.readthedocs.io/en/latest/) for local
 parallelization (within one machine) and supports using
@@ -125,7 +125,7 @@ will typically make a copy of the whole model and dataset to each worker, even
 if the re-training only happens on a subset of the data. This means that you
 should make sure that each worker has enough memory to handle the whole dataset.
 
-### Ray
+#### Ray
 
 Please follow the instructions in Ray's documentation to set up a cluster.
 Once you have a running cluster, you can use it by passing the address
diff --git a/src/pydvl/utils/caching/__init__.py b/src/pydvl/utils/caching/__init__.py
index bc72741e0..1089628bc 100644
--- a/src/pydvl/utils/caching/__init__.py
+++ b/src/pydvl/utils/caching/__init__.py
@@ -1,6 +1,7 @@
 """Caching of functions.
 
-pyDVL caches (memoizes) utility values to allow reusing previously computed evaluations.
+PyDVL can cache (memoize) the computation of the utility function
+and speed up some computations for data valuation.
 
 !!! Warning
     Function evaluations are cached with a key based on the function's signature
@@ -10,67 +11,65 @@
 
 # Configuration
 
-Memoization is disabled by default but can be enabled easily,
+Caching is disabled by default but can be enabled easily,
 see [Setting up the cache](#setting-up-the-cache).
 When enabled, it will be added to any callable used to construct a
-[Utility][pydvl.utils.utility.Utility] (done with the decorator [@memcached][pydvl.utils.caching.memcached]).
+[Utility][pydvl.utils.utility.Utility] (done with the wrap method of
+[CacheBackend][pydvl.utils.caching.base.CacheBackend]).
 Depending on the nature of the utility you might want to
 enable the computation of a running average of function values, see
 [Usage with stochastic functions](#usaage-with-stochastic-functions).
-You can see all configuration options under [MemcachedConfig][pydvl.utils.config.MemcachedConfig].
+You can see all configuration options under
+[CachedFuncConfig][pydvl.utils.caching.config.CachedFuncConfig].
 
-## Default configuration
+# Supported Backends
 
-```python
-default_config = dict(
-   server=('localhost', 11211),
-   connect_timeout=1.0,
-   timeout=0.1,
-   # IMPORTANT! Disable small packet consolidation:
-   no_delay=True,
-   serde=serde.PickleSerde(pickle_version=PICKLE_VERSION)
-)
-```
+pyDVL supports 3 different caching backends:
 
-# Supported Backends
+- [InMemoryCacheBackend][pydvl.utils.caching.memory.InMemoryCacheBackend]:
+  an in-memory cache backend that uses a dictionary to store and retrieve
+  cached values. This is used to share cached values between threads
+  in a single process.
+- [DiskCacheBackend][pydvl.utils.caching.disk.DiskCacheBackend]:
+  a disk-based cache backend that uses pickled values written to and read from disk.
+  This is used to share cached values between processes in a single machine.
+- [MemcachedCacheBackend][pydvl.utils.caching.memcached.MemcachedCacheBackend]:
+  a [Memcached](https://memcached.org/)-based cache backend that uses pickled values written to
+  and read from a Memcached server. This is used to share cached values
+  between processes across multiple machines.
 
-- [InMemoryCacheBackend][]
-- [DiskCacheBackend][]
-- [MemcachedCacheBackend][]
+  **Note** This specific backend requires optional dependencies.
+  See [[installation#extras]] for more information)
 
 # Usage with stochastic functions
 
-In addition to standard memoization, the decorator
-[memcached()][pydvl.utils.caching.memcached] can compute running average and
-standard error of repeated evaluations for the same input. This can be useful
-for stochastic functions with high variance (e.g. model training for small
-sample sizes), but drastically reduces the speed benefits of memoization.
+In addition to standard memoization, the wrapped functions
+can compute running average and standard error of repeated evaluations
+for the same input. This can be useful for stochastic functions with high variance
+(e.g. model training for small sample sizes), but drastically reduces
+the speed benefits of memoization.
 
-This behaviour can be activated with the argument `allow_repeated_evaluations`
-to [memcached()][pydvl.utils.caching.memcached].
+This behaviour can be activated with the option
+[allow_repeated_evaluations][pydvl.utils.caching.config.CachedFuncConfig]..
 
 # Cache reuse
 
-When working directly with [memcached()][pydvl.utils.caching.memcached],  it is
+When working directly with [CachedFunc][pydvl.utils.caching.base.CachedFunc],  it is
 essential to only cache pure functions. If they have any kind of state, either
 internal or external (e.g. a closure over some data that may change), then the
 cache will fail to notice this and the same value will be returned.
 
-When a function is wrapped with [memcached()][pydvl.utils.caching.memcached] for
-memoization, its signature (input and output names) and code are used as a key
-for the cache. Alternatively you can pass a custom value to be used as key with
-
-```python
-cached_fun = memcached(**asdict(cache_options))(fun, signature=custom_signature)
-```
+When a function is wrapped with [CachedFunc][pydvl.utils.caching.base.CachedFunc]
+for memoization, its signature (input and output names) and code are used as a key
+for the cache.
 
 If you are running experiments with the same [Utility][pydvl.utils.utility.Utility]
 but different datasets, this will lead to evaluations of the utility on new data
 returning old values because utilities only use sample indices as arguments (so
 there is no way to tell the difference between '1' for dataset A and '1' for
 dataset 2 from the point of view of the cache). One solution is to empty the
-cache between runs, but the preferred one is to **use a different Utility
-object for each dataset**.
+cache between runs by calling the `clear` method of the cache backend instance,
+but the preferred one is to **use a different Utility object for each dataset**.
 
 # Unexpected cache misses
 
@@ -79,7 +78,7 @@
 run across multiple processes and some reporting arguments are added (like a
 `job_id` for logging purposes), these will be part of the signature and make the
 functions distinct to the eyes of the cache. This can be avoided with the use of
-[ignore_args][pydvl.utils.config.MemcachedConfig] in the configuration.
+[ignore_args][pydvl.utils.caching.config.CachedFuncConfig] option in the configuration.
 
 """
 from .base import *