Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DPE-3684] Prevent stuck raft cluster on leader departure #379

Draft
wants to merge 17 commits into
base: main
Choose a base branch
from

Conversation

dragomirp
Copy link
Contributor

No description provided.

Copy link

codecov bot commented Feb 29, 2024

Codecov Report

Attention: Patch coverage is 11.11111% with 80 lines in your changes missing coverage. Please review.

Project coverage is 69.18%. Comparing base (a0c3d80) to head (c0f41df).

Files Patch % Lines
src/charm.py 11.84% 67 Missing ⚠️
src/cluster.py 7.14% 13 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #379      +/-   ##
==========================================
- Coverage   70.89%   69.18%   -1.71%     
==========================================
  Files          12       12              
  Lines        3030     3109      +79     
  Branches      536      556      +20     
==========================================
+ Hits         2148     2151       +3     
- Misses        768      844      +76     
  Partials      114      114              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

"""Try to remove a raft member calling a partner node."""
for attempt in Retrying(stop=stop_after_delay(60), wait=wait_fixed(3), reraise=True):
with attempt:
if not self._patroni.stop_patroni():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this change the leader in the raft cluster? (if the unit where Patroni is being stopped was the raft cluster leader)

Comment on lines +1076 to +1081
if not status:
raft_host = "localhost:2222"
if not (status := self._get_raft_status(syncobj_util, raft_host)):
logger.warning("Stopping unit: all raft members are unreachable")
self._patroni.stop_patroni()
return
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if, in this situation, this unit can still be present in the raft cluster on the other units.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it can be because of the network cut case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants