-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature Request: hedged request between the external cache and the object storage #6712
Comments
I had a similar idea. We could use this https://github.com/cristalhq/hedgedhttp HTTP client to implement this. Just before that, we could make it better by estimating the 90th percentile and automatically sending a hedged request if the duration exceeds that. T-Digest seems like a good option for estimating the percentiles. Ideally, we would like to avoid specifying a threshold from which another request should be sent. |
Hello @GiedriusS can I work on this ? |
@Vanshikav123 sure. cristalhq/hedgedhttp#52 that client now supports dynamic thresholds/durations so shouldn't be too hard to implement with t-digest 🤔 |
@GiedriusS it would be great help if you provide me with some references for this issue. |
Hi @GiedriusS @damnever, I am interested in working on this issue, in the LFX term 3. |
Hi @GiedriusS ! I am very interested in working on this issue through LFX. Just wondering do I need to submit a proposal on the implementation? |
Please submit everything through the LFX website 😊 |
Hello @GiedriusS @saswatamcode, |
I am really interested in contributing to Thanos. Is there any pretests that I can work on? @damnever @GiedriusS |
Yes, please submit everything using the LFX website 🙂 |
Is there any pre task that can help me understand Thanos better? I looked up on the working of Thanos. |
@GiedriusS @saswatamcode I am a bit confused about how https://github.com/cristalhq/hedgedhttp would be able to achieve this task. The hedge HTTP client in this implementation would send HTTP requests to the same destination, while in this use case, the first and second requests shall be sent to different services, external cache service, and obj respectively. Would something like a timeout monitoring mechanism since the first request, and then sending the second request using the same HTTP client if latency > t-digest.Quantile(90) make sense? |
Yeah, sorry for the confusion 🤦 hedging between two different systems doesn't make sense. Cache operations are supposed to be ultra fast. I believe the original issue is that with some k/v storages like memcached one is always forced to download the same data. So, in case the cached data is big, it takes a long time. This could be solved by having a two layered cache. We use client-side caching in Redis to solve this problem and it works well. With it, hot items don't need to be re-downloaded constantly because they are kept in memory. I will edit the title/description once I have some time unless someone disagrees. And yes, I do imagine it to work something like that. The hedged HTTP client works like that - it sends another request if some timeout is reached. We could use the t-digest library to avoid the guesswork of setting the latency after which to send another request manually. |
@GiedriusS If you have a moment, could you clarify this for me?
|
@GiedriusS Is this issue still valid? From your comment looks like we can still have some sort of hedging but not using hedgedhttp library? |
Something we observe is that the external caches can still get overloaded sometimes and your requests can stuck waiting. We use two layered cache but it still happen sometimes. We probably need a circuit breaker for GET cache operations as mentioned in #7010 (comment) |
Is your proposal related to a problem?
The long-tail requests sometimes are inevitable between the store-gateway and the external cache service. Lowering the timeouts between the store-gateway and the cache service isn't a proper way to address this problem.
Describe the solution you'd like
If accessing the external cache service takes too long, issue a hedged request to the object storage, as object storages have reasonable latency on average nowadays.
Describe alternatives you've considered
Additional context
The text was updated successfully, but these errors were encountered: