Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#7514 Fix channel deadlock in meta sync fetcher #7933

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

mfoldenyi
Copy link

@mfoldenyi mfoldenyi commented Nov 22, 2024

Worker threads no longer exit on error, instead they collect them into a multiError and keep working.

Fix proposal for #7514

  • I added CHANGELOG entry for this change due to it fixing a deadlock that potantially affect all components
  • Change is not relevant to the end user.

Changes

Changed concurrent worker behavior in case of error in Exists call, they no longer exit on error, instead collect errors in a multiError and continue working.

Verification

I plan to force errors using minio and try to exhaust the threadpool, but I am not yet sure how exactly. Without the patch I should be able to do that, with the patch it should not be possible.

@mfoldenyi mfoldenyi marked this pull request as ready for review November 22, 2024 08:58
@mfoldenyi
Copy link
Author

@fpetkovski Could you please take a look when you have some time?

Copy link
Member

@GiedriusS GiedriusS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any way we could add a unit test here? I'm afraid that otherwise we will have this problem again

@mfoldenyi
Copy link
Author

@GiedriusS Sure, I will try to implement a test which simulates store errors. it should be running into the deadlock with the original code, but not with the patch. I hopefully can have something by the end of the week.

Signed-off-by: Miklós Földényi <[email protected]>

# Conflicts:
#	CHANGELOG.md
Signed-off-by: Miklós Földényi <[email protected]>
@mfoldenyi
Copy link
Author

@GiedriusS I created a PR with just the test case to show the error by itself with the code on main: https://github.com/thanos-io/thanos/pull/7966/files

The same test case is passing with the provided fix.

Signed-off-by: Miklós Földényi <[email protected]>
Signed-off-by: Miklós Földényi <[email protected]>
Signed-off-by: Miklós Földényi <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants