-
Notifications
You must be signed in to change notification settings - Fork 175
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Set managed hash labels in manager tasks and only update tasks when managed hash changes #2142
Set managed hash labels in manager tasks and only update tasks when managed hash changes #2142
Conversation
Skipping CI for Draft Pull Request. |
@rzetelskik: GitHub didn't allow me to request PR reviews from the following users: rzetelskik. Note that only scylladb members and repo collaborators can review this PR, and authors cannot review their own PRs. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
#2061 (comment) |
#1028 (comment) |
Timed out waiting for tasks to sync, I'll need to look into it. |
6655698
to
c62eb49
Compare
Being fixed in #2143. |
c62eb49
to
27cb065
Compare
27cb065
to
e6eaa65
Compare
#2143 merged |
#2096 (comment) |
/test images |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
/assign tnozicka
suite timed out |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: rzetelskik, tnozicka, zimnx The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/hold |
/hold cancel |
Description of your changes: Currently, Scylla Manager integration controller decides to update the tasks defined in ScyllaCluster's spec by checking for deep equality between the task definition in ScyllaCluster's spec and the task status obtained from the Manager's state.
Due to some discrepancies between how we and Scylla Manager keep tasks' properties, and the fact that some values would be converted, but not converted back when reading the status, this would often lead to task update hot loops, where a task update requests would be sent despite no changes in spec.
Such scenario can be observed by specifying a repair task with e.g.
smallTableThreshold
set:The threshold would be converted to bytes, so that the corresponding status would be:
The value would not be converted back, causing the deep equality test to always return false, and so an update would be triggered on every comparison, causing a hot loop.
There were also cases reported by users in which we suspected the constant updates were causing a high cpu load: scylladb/scylla-manager#3920.
This PR addresses this issue by making use of
labels
recently added to the manager's task API. Similarily to how we handle managed k8s resources, the scylla-manager controller now computes a hash from the task's spec and sets a managed hash label. Therefore, an update request is only sent when a task spec changes in ScyllaCluster and the managed hash no longer matches the one set in scylla manager's state (and correspondingly, ScyllaCluster's status).Changes in the manager's state which are not reflected in our API (e.g. when a user manually updates a task with sctool) will not trigger an update, but they will be overwritten when we eventually reconcile.
This PR does not extend ScyllaCluster's API with the
labels
field for Scylla Manager's tasks, at this point we're only using them internally and don't expose them in spec to the users of our API. If there ever is a need to do add it, the managed hash would take precedence over colliding labels.As this PR further extends the logic invoked on every task sync, some parts of the code are deduplicated to reduce the bug-prone surface. Consequently we're loosing some of the duplicated unit tests and unnecessary wrappers/interfaces.
A few new unit tests are added to test that the changes in task's state do not trigger an update unless the managed hash doesn't match.
These changes also uncovered a bug being hidden by the hotloop.
Requires:
Which issue is resolved by this Pull Request:
Resolves #1827
/kind bug
/priority important-soon
/cc