feat: add exact match caching #1717

ayulockin · 2024-11-28T19:19:19Z

This PR introduces a robust caching abstraction to improve performance and prevent progress loss during API calls (e.g., LLM or embedding requests). This was brought up in #1522 and later added as an enhancement request in #1602.

Key Features:

Flexible Backend Support: Added CacheInterface with DiskCacheBackend (using diskcache) and InMemoryCacheBackend (not tested yet).
Deterministic Cache Key Generation: So far, I was successful in generating deterministic and unique cache keys.
Decorator-Based Integration: Simple @cacher decorator to seamlessly apply caching to both synchronous and asynchronous functions. This can act as a singular interface. Additionally, I can create a CacheMixin class from which other classes can inherit. But personally liked the elegance of a decorator.

Questions:

What's would be a good assumption for the size of dataset used for evaluation/number of LLM calls made? Asking this so that I can plan benchmarking accordingly. cc: @jjmachan
The implemented _generate_cache_key uses the args and kwargs to find attributes that should be deterministic. For now I am excluding keys that have memory address. This can be done more gracefully maybe. Or we can leverage the __repr__ or __hash__ written for PydanticPrompt but not sure if this is the consensus for writing most core components in the lib.

Few obvious TODOs:

jjmachan · 2024-11-28T21:16:41Z

Great to see this draft 🙂

This should give you an average distribution for last month @ayulockin

this ignores rows which had the value 1 (assume single evaluations)

ayulockin · 2024-11-29T06:57:15Z

Thanks @jjmachan, I will stress test on an evaluation set of 300 rows (on the top of my mind). I think it would be sufficient amount give the percentile 95 is 120 and average is 40 rows. Will benchmark a few methods and let you know. Super useful data points.

jjmachan · 2024-11-29T10:34:27Z

hey @ayulockin I was thinking, instead of benchmarking evaluation, we have to evaluate prompt inputs here right - that will be more accurate benchmark. We can synthically create a few of these too

but a few questions

what aspects are we benchmarking/measuring?
what different situations should we consider (I think mostly it will be the different key_length but is there any other factors?)

ayulockin · 2024-11-29T11:17:28Z

instead of benchmarking evaluation, we have to evaluate prompt inputs here right

Caching should add some latency. I want to benchmark evaluation/testset generation (as a case study) with caching enabled and without it. The difference should not be noticeable imo. Running this is easy.

I am not sure what you exactly mean by "evaluate prompt inputs here". My guess is on unique key generation given a prompt input and that it is deterministic (we get the same key when we give the same prompt input) and the lookup speed/insertion speed, etc. I think with evaluation benchmark I meant the same thing.

So to answer your question on "what aspects are we benchmarking/measuring?" > It's latency. Also because we are saving the raw output as the value to the key, need to be careful about the disk memory usage too imo (but should not be such a big deal considering the percentile usage data you shared -- one will never run LLM based eval on 100k samples 😅).

what different situations should we consider (I think mostly it will be the different key_length but is there any other factors?)

I am generating key like this:

ragas/src/ragas/cache.py

Line 81 in 55e7472

def _generate_cache_key(func, args, kwargs):

it's quite generic but want to check it with the embeddings as well (embedding models are very cheap so up to you if you want this to be cached too.)

TLDR;

Aspect	Metric	How to Measure
Cache Lookup Speed	Lookup latency (ms)	Measure time to fetch from cache for keys of varying lengths.
Cache Insertion Speed	Insertion latency (ms)	Measure time to insert new cache entries.
End-to-End Latency	Total function time (ms)	Compare cached vs. uncached function calls.
Cache Hit Rate	Hit rate (%)	Track cache hits vs. misses during repeated evaluations.
Memory/Disk Overhead	Cache size (MB/GB)	Monitor cache size with different inputs and backends.
Key Length Impact	Latency by key length	Test small vs. large keys for hashing and lookup time.
Concurrency	Latency under load (ms)	Simulate multiple threads/processes using the cache concurrently.
Backend Comparison	Latency for disk vs. memory	Compare `diskcache` and in-memory caching for similar workloads.

ayulockin · 2024-11-30T05:41:31Z

Ran evaluation with and without caching on the rungalileo/ragbench "tech" subset with 314 samples in the test set.

Without caching: 6 minutes 32.21 seconds

With caching: 5 minutes 46.31 seconds

Hence diskcache is practically having no overhead because it gets overlapped with the openai's latency.

Running the same eval that was cached took only 3.27 seconds.

The resulting .cache/cache.db is only taking 2 MB for this example. I don't think key_length is a concern here.

Thoughts:

diskcache is good enough. anything else would be over engineering at this point.
caching should be parameterised i.e, the user should easily be able to turn it on or off given the non-deterministic nature of evals -- one would like to run without cache.

jjmachan · 2024-11-30T08:28:46Z

(but should not be such a big deal considering the percentile usage data you shared -- one will never run LLM based eval on 100k samples 😅).

this actually doesn't hold for testset generation, especially since there are a lot of transforms involved which process a lot of documents

I am not sure what you exactly mean by "evaluate prompt inputs here". My guess is on unique key generation given a prompt input and that it is deterministic (we get the same key when we give the same prompt input) and the lookup speed/insertion speed, etc. I think with evaluation benchmark I meant the same thing.

one are that might perform different is testset generation which will behave differently when compared to things that are cached when evaluating. This is why I though we would consider a different benchmark but that is an overkill. Instead what we could do is to benchmark how caching affects the testset generation module too - what do you think?

diskcache is good enough. anything else would be over engineering at this point.

caching should be parameterised i.e, the user should easily be able to turn it on or off given the non-deterministic nature of evals -- one would like to run without cache.

should we consider the trade-offs for in memory? I don't think we have too because even though disk caching is way slower than in-memory it is cheaper and can store more and given the biggest bottleneck is LLMs it should be a great fit
yep, lets do that. how many aspects of this should we configure. 2 things I can think off are

on/off (if there are any caching backend specific args, that will come here too)
how to which backend to use

how users will use these config is the question

ayulockin · 2024-12-01T08:18:03Z

Instead what we could do is to benchmark how caching affects the testset generation module too - what do you think?

You are right. Test set generation is where I am right now.

should we consider the trade-offs for in memory? I don't think we have too because even though disk caching is way slower than in-memory it is cheaper and can store more and given the biggest bottleneck is LLMs it should be a great fit

In memory should be an option for the users -- maybe test generation benefits from it (just started investigating). The default should be diskcache.

on/off (if there are any caching backend specific args, that will come here too)

I think I have a way to do it cleanly. Will let you know.

ayulockin · 2024-12-02T03:34:18Z

The caching works fine with test generation.

But I faced a few issues from my dev branch while running test generation so, did a fresh install in a new environment by cloning directly from the main branch (no fork) to make sure if the issues persist.

I have documented them here #1718. I am either doing something stupid or the issues are real.

jjmachan · 2024-12-02T05:26:02Z

thanks a lot for the update @ayulockin 🙂

should we aim to just work on diskcache for now and then open an issue to get more feedback on this

ayulockin · 2024-12-07T11:05:21Z

@jjmachan the PR is ready for review.

ayulockin · 2024-12-09T08:03:44Z

@jjmachan moved caching to the async def generate method of BaseRagasLLM as per our discussion.

jjmachan · 2024-12-12T05:55:42Z

src/ragas/llms/base.py

+    @cacher()
    async def generate(
        self,
        prompt: PromptValue,


not sure about this approach

jjmachan

@ayulockin there are a couple of things here

because cacher is a decorator users don't have control over this other than the environment variable, so this has the same problem as this being in PydanticPrompt. In my mind there where 2 usecases

user has to enable/disable caching
user has to be able to change the caching service

I'll write done how I was thinking about this

1. enable caching

from ragas.llms import LangchainLLMWrapper
from langchain_openai import ChatOpenAI

gpt4o = ChatOpenAI()
evaluator_llm = LangchainLLMWrapper(gpt4o, caching=True)

# now you can use the LLM as you want
...

the same applies for embeddings too - this was something missing in this PR.

customize caching

# implement a cache backend from the interface
from ragas.cache import CacheInterface

# maybe with Redis backend
class RedisCacher(CacheInterface):
	# implementation

# use with LLMWrapper itself
gpt4o = ChatOpenAI()
evaluator_llm = LangchainLLMWrapper(gpt4o, cacher=RedisCacher())

now one con I see of this approach is that there are 2 keywords we have to add so for the "ennable caching usecase" I was thinking myabe we can join them

from ragas.cache import DefaultCacher
# we can name it DiskCacheCacher too

# use with LLMWrapper itself
gpt4o = ChatOpenAI()
evaluator_llm = LangchainLLMWrapper(gpt4o, cacher=DefaultCacher())

what do you think?

ayulockin · 2024-12-14T08:59:03Z

Hey @jjmachan, this is great. Let me quickly implement this.

what do you think?

This is clearly a cleaner approach from a user pov (env can be scary especially in a big project). My approach was to make caching a hidden thing with just the control to turn on/off and select backend but I find your idea more appealing.

ayulockin · 2024-12-14T09:33:06Z

I went with the evaluator_llm = LangchainLLMWrapper(gpt4o, cache=DiskCacheBackend()) approach.

This reverts commit f8e9e61.

ayulockin added 3 commits November 29, 2024 00:33

prototype caching

8948a6e

abstract away caching+unique keys

9c885d2

remove bad TODO

55e7472

simplify

da974a1

Merge branch 'explodinggradients:main' into ayulockin/cache

7bb60a4

ayulockin added 6 commits December 3, 2024 21:47

add deps in test

3656778

Merge branch 'explodinggradients:main' into ayulockin/cache

f14edac

caching on/off

a41b583

formatting

a74d73a

add tests

ee56e8a

pyright + windows test

b3ebb1f

ayulockin marked this pull request as ready for review December 7, 2024 11:05

dosubot bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Dec 7, 2024

ayulockin added 3 commits December 9, 2024 13:11

Merge branch 'explodinggradients:main' into ayulockin/cache

929deb8

remove caching to base llm

dc0248b

formatting

6c0e651

ayulockin added 2 commits December 9, 2024 22:46

Merge branch 'main' into ayulockin/cache

6daa30f

Merge branch 'explodinggradients:main' into ayulockin/cache

aab887b

jjmachan reviewed Dec 12, 2024

View reviewed changes

ayulockin added 2 commits December 14, 2024 14:29

Merge branch 'explodinggradients:main' into ayulockin/cache

27f5d40

make caching parameterized from the llm wrappers

1c12986

ayulockin added 8 commits December 14, 2024 15:25

caching support for embedding

28fa1fb

cache mixin

f8e9e61

formatting

4884922

Revert "cache mixin"

905a072

This reverts commit f8e9e61.

formatting + test

7f5d399

make llm caching implementation less ugly

0a79a04

langchain

f84f300

cleaner embedding

7299bd3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add exact match caching #1717

feat: add exact match caching #1717

ayulockin commented Nov 28, 2024 •

edited

Loading

jjmachan commented Nov 28, 2024

ayulockin commented Nov 29, 2024

jjmachan commented Nov 29, 2024

ayulockin commented Nov 29, 2024

ayulockin commented Nov 30, 2024 •

edited

Loading

jjmachan commented Nov 30, 2024

ayulockin commented Dec 1, 2024

ayulockin commented Dec 2, 2024

jjmachan commented Dec 2, 2024

ayulockin commented Dec 7, 2024

ayulockin commented Dec 9, 2024

jjmachan Dec 12, 2024

jjmachan left a comment

ayulockin commented Dec 14, 2024

ayulockin commented Dec 14, 2024

feat: add exact match caching #1717

Are you sure you want to change the base?

feat: add exact match caching #1717

Conversation

ayulockin commented Nov 28, 2024 • edited Loading

jjmachan commented Nov 28, 2024

ayulockin commented Nov 29, 2024

jjmachan commented Nov 29, 2024

ayulockin commented Nov 29, 2024

TLDR;

ayulockin commented Nov 30, 2024 • edited Loading

jjmachan commented Nov 30, 2024

ayulockin commented Dec 1, 2024

ayulockin commented Dec 2, 2024

jjmachan commented Dec 2, 2024

ayulockin commented Dec 7, 2024

ayulockin commented Dec 9, 2024

jjmachan Dec 12, 2024

Choose a reason for hiding this comment

jjmachan left a comment

Choose a reason for hiding this comment

1. enable caching

customize caching

ayulockin commented Dec 14, 2024

ayulockin commented Dec 14, 2024

ayulockin commented Nov 28, 2024 •

edited

Loading

ayulockin commented Nov 30, 2024 •

edited

Loading