Feature Request: hedged request between the external cache and the object storage #6712

damnever · 2023-09-08T04:12:55Z

Is your proposal related to a problem?

The long-tail requests sometimes are inevitable between the store-gateway and the external cache service. Lowering the timeouts between the store-gateway and the cache service isn't a proper way to address this problem.

Describe the solution you'd like

If accessing the external cache service takes too long, issue a hedged request to the object storage, as object storages have reasonable latency on average nowadays.

Describe alternatives you've considered

Additional context

GiedriusS · 2023-09-08T11:44:56Z

I had a similar idea. We could use this https://github.com/cristalhq/hedgedhttp HTTP client to implement this. Just before that, we could make it better by estimating the 90th percentile and automatically sending a hedged request if the duration exceeds that. T-Digest seems like a good option for estimating the percentiles. Ideally, we would like to avoid specifying a threshold from which another request should be sent.

Vanshikav123 · 2023-10-13T15:39:16Z

Hello @GiedriusS can I work on this ?

GiedriusS · 2023-10-14T10:34:22Z

@Vanshikav123 sure. cristalhq/hedgedhttp#52 that client now supports dynamic thresholds/durations so shouldn't be too hard to implement with t-digest 🤔

Vanshikav123 · 2023-10-18T11:25:43Z

@GiedriusS it would be great help if you provide me with some references for this issue.

rahulbansal3005 · 2024-08-02T15:00:02Z

Hi @GiedriusS @damnever, I am interested in working on this issue, in the LFX term 3.

Zyyeric · 2024-08-10T21:30:52Z

Hi @GiedriusS ! I am very interested in working on this issue through LFX. Just wondering do I need to submit a proposal on the implementation?

GiedriusS · 2024-08-11T06:41:14Z

Please submit everything through the LFX website 😊

aakashbansode2310 · 2024-08-11T20:52:14Z

Hello @GiedriusS @saswatamcode,
I hope this message finds you well. My name is Aakash Undergraduate from IIT Bombay, and I am excited to contribute to the implementation of hedged requests for reducing tail latency in Thanos.
I'm eager to help enhance the performance and reliability of Thanos and would greatly appreciate your guidance. I'm looking forward to collaborating and making this improvement together!

mani1911 · 2024-08-12T07:48:50Z

I am really interested in contributing to Thanos. Is there any pretests that I can work on? @damnever @GiedriusS
Should I submit my proposal through cover letter (LFX term 3)?

saswatamcode · 2024-08-12T08:08:22Z

Yes, please submit everything using the LFX website 🙂

mani1911 · 2024-08-12T08:36:56Z

Yes, please submit everything using the LFX website 🙂

Is there any pre task that can help me understand Thanos better? I looked up on the working of Thanos.

Zyyeric · 2024-08-13T21:50:55Z

@GiedriusS @saswatamcode I am a bit confused about how https://github.com/cristalhq/hedgedhttp would be able to achieve this task. The hedge HTTP client in this implementation would send HTTP requests to the same destination, while in this use case, the first and second requests shall be sent to different services, external cache service, and obj respectively. Would something like a timeout monitoring mechanism since the first request, and then sending the second request using the same HTTP client if latency > t-digest.Quantile(90) make sense?

GiedriusS · 2024-08-15T05:47:33Z

Yeah, sorry for the confusion 🤦 hedging between two different systems doesn't make sense. Cache operations are supposed to be ultra fast. I believe the original issue is that with some k/v storages like memcached one is always forced to download the same data. So, in case the cached data is big, it takes a long time. This could be solved by having a two layered cache. We use client-side caching in Redis to solve this problem and it works well. With it, hot items don't need to be re-downloaded constantly because they are kept in memory. I will edit the title/description once I have some time unless someone disagrees.

And yes, I do imagine it to work something like that. The hedged HTTP client works like that - it sends another request if some timeout is reached. We could use the t-digest library to avoid the guesswork of setting the latency after which to send another request manually.

milinddethe15 · 2024-08-15T06:52:39Z

@GiedriusS If you have a moment, could you clarify this for me?

Do obj storage providers internally manage query requests among replicas? If not then do we need to make thanos do that for hedged requests?
https://cloud-native.slack.com/archives/CK5RSSC10/p1723450096247419?thread_ts=1723358359.204139&cid=CK5RSSC10

yeya24 · 2024-10-01T00:22:52Z

@GiedriusS Is this issue still valid? From your comment looks like we can still have some sort of hedging but not using hedgedhttp library?

yeya24 · 2024-11-17T19:19:58Z

Yeah, sorry for the confusion 🤦 hedging between two different systems doesn't make sense. Cache operations are supposed to be ultra fast. I believe the original issue is that with some k/v storages like memcached one is always forced to download the same data. So, in case the cached data is big, it takes a long time. This could be solved by having a two layered cache. We use client-side caching in Redis to solve this problem and it works well. With it, hot items don't need to be re-downloaded constantly because they are kept in memory. I will edit the title/description once I have some time unless someone disagrees.

Something we observe is that the external caches can still get overloaded sometimes and your requests can stuck waiting. We use two layered cache but it still happen sometimes. We probably need a circuit breaker for GET cache operations as mentioned in #7010 (comment)

GiedriusS added feature request/improvement difficulty: medium help wanted component: store labels Sep 8, 2023

Vanshikav123 mentioned this issue May 4, 2024

Hedged-request: Handling in HTTP Client Configuration thanos-io/objstore#118

Closed

2 tasks

This was referenced May 13, 2024

Hedged-request: Handling in HTTP Client Configuration #7355

Closed

Hedged-request: Handling in HTTP Client Configuration thanos-io/objstore#119

Open

saswatamcode added the GSoC/Community Bridge/LFX label Aug 12, 2024

milinddethe15 mentioned this issue Nov 14, 2024

store: support hedged requests #7860

Merged

2 tasks

GiedriusS closed this as completed in #7860 Nov 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request: hedged request between the external cache and the object storage #6712

Feature Request: hedged request between the external cache and the object storage #6712

damnever commented Sep 8, 2023

GiedriusS commented Sep 8, 2023

Vanshikav123 commented Oct 13, 2023

GiedriusS commented Oct 14, 2023

Vanshikav123 commented Oct 18, 2023

rahulbansal3005 commented Aug 2, 2024

Zyyeric commented Aug 10, 2024

GiedriusS commented Aug 11, 2024

aakashbansode2310 commented Aug 11, 2024

mani1911 commented Aug 12, 2024 •

edited

Loading

saswatamcode commented Aug 12, 2024

mani1911 commented Aug 12, 2024

Zyyeric commented Aug 13, 2024

GiedriusS commented Aug 15, 2024

milinddethe15 commented Aug 15, 2024

yeya24 commented Oct 1, 2024

yeya24 commented Nov 17, 2024

Feature Request: hedged request between the external cache and the object storage #6712

Feature Request: hedged request between the external cache and the object storage #6712

Comments

damnever commented Sep 8, 2023

Is your proposal related to a problem?

Describe the solution you'd like

Describe alternatives you've considered

Additional context

GiedriusS commented Sep 8, 2023

Vanshikav123 commented Oct 13, 2023

GiedriusS commented Oct 14, 2023

Vanshikav123 commented Oct 18, 2023

rahulbansal3005 commented Aug 2, 2024

Zyyeric commented Aug 10, 2024

GiedriusS commented Aug 11, 2024

aakashbansode2310 commented Aug 11, 2024

mani1911 commented Aug 12, 2024 • edited Loading

saswatamcode commented Aug 12, 2024

mani1911 commented Aug 12, 2024

Zyyeric commented Aug 13, 2024

GiedriusS commented Aug 15, 2024

milinddethe15 commented Aug 15, 2024

yeya24 commented Oct 1, 2024

yeya24 commented Nov 17, 2024

mani1911 commented Aug 12, 2024 •

edited

Loading