Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document current HA model (Epic) #361

Open
4 tasks
l-mb opened this issue Mar 2, 2023 · 7 comments
Open
4 tasks

Document current HA model (Epic) #361

l-mb opened this issue Mar 2, 2023 · 7 comments
Assignees
Labels
area/kubernetes k8s and related kind/design Design work kind/documentation Improvements or additions to documentation kind/epic Umbrella issue for a group of related issues kind/research Issues that need to be researched triage/next-candidate This could be moved to the next milestone

Comments

@l-mb
Copy link

l-mb commented Mar 2, 2023

What needs to be done

Document the high-availability guarantees of s3gw on top of Longhorn.

  • What is the expected fail-over time in response to a node/process failure
  • What does it depend on (LH setup, config options, ingress used ...)

Why it needs to be done

Users of storage solutions expect to understand the HA model of their solutions to gauge whether it's an appropriate choice for their workloads.

This is a first step in identifying what s3gw has as-is. What's the reasonable/best thing we can do today?

Once we have that, we should have follow ups on how to improve it (based on user/PM requirements) to enable further use cases. We should, once we have this documented, also consider if we can test/validate these claims.

Acceptance Criteria

Blog post, possibly part of the project's official documentation.

This could initially be implemented similarly to Longhorn or Rancher Telemetry. Perhaps there is an opportunity for code sharing.

Tasks

@l-mb l-mb added kind/documentation Improvements or additions to documentation kind/research Issues that need to be researched kind/design Design work labels Mar 2, 2023
@l-mb l-mb added this to S3GW Mar 2, 2023
@github-project-automation github-project-automation bot moved this to Backlog in S3GW Mar 2, 2023
@l-mb l-mb moved this from Backlog to Epics in S3GW Mar 2, 2023
@jecluis jecluis added the area/kubernetes k8s and related label Mar 2, 2023
@jhmarina jhmarina changed the title [Backlog] Document current HA model ⛰ Document current HA model (Epic) Mar 7, 2023
@jhmarina jhmarina changed the title ⛰ Document current HA model (Epic) Document current HA model (Epic) May 8, 2023
@jhmarina jhmarina moved this to Epics in S3GW May 9, 2023
@jhmarina jhmarina moved this to Backlog in S3GW May 9, 2023
@giubacc giubacc assigned giubacc and unassigned giubacc May 11, 2023
@giubacc
Copy link

giubacc commented May 11, 2023

Is there an idea/expectation when to allocate this activity in the context of LH integration?
The topic seems quite vast (and important).

@jhmarina jhmarina added this to the v0.22.0 milestone May 23, 2023
@jhmarina
Copy link
Contributor

Yes, @giubacc this is planned for Milestone 4 of the Experimental Schedule, which is our v0.22.0 milestone.

@jhmarina jhmarina added kind/epic Umbrella issue for a group of related issues LH 1.6 labels May 23, 2023
@jhmarina jhmarina moved this from Backlog to Epics in S3GW Aug 17, 2023
@jecluis
Copy link
Contributor

jecluis commented Aug 21, 2023

@giubacc will be doing some initial research into what we can get out of the underlying system, and its limitations, before we evaluate what are our options to move this forward.

giubacc referenced this issue in giubacc/s3gw Aug 30, 2023
- add research/ha/RATIONALE.md

Related to: https://github.com/aquarist-labs/s3gw/issues/361
Signed-off-by: Giuseppe Baccini <[email protected]>
giubacc referenced this issue in giubacc/s3gw Sep 13, 2023
regular-localhost-incremental-fill-5k
regular_localhost_load_fio_64_write
regular_localhost_zeroload_400_800Kdb
regular_localhost_zeroload_emptydb
segfault_localhost_zeroload_emptydb

Related to: https://github.com/aquarist-labs/s3gw/issues/361
Signed-off-by: Giuseppe Baccini <[email protected]>
@giubacc
Copy link

giubacc commented Sep 19, 2023

what still needs to be done:

  • testing the s3gw when there is an actual node fault (not the taint but a more drastic poweroff -f)
  • integrate a client in the restart loop and measure how many ops are failing/succeding/hanging
  • finalize the research PR
  • make an ADR simply pointing the research and decretating the HA model for the s3gw.

other?

giubacc referenced this issue in giubacc/s3gw Sep 28, 2023
- scale_deployment_0_1-k3s3nodes-zeroload-emptydb
- s3wl-putobj-100ms-clusterip
- s3wl-putobj-100ms-ingress

Related to: https://github.com/aquarist-labs/s3gw/issues/361
Signed-off-by: Giuseppe Baccini <[email protected]>
giubacc referenced this issue in giubacc/s3gw Oct 16, 2023
- add research/ha/RATIONALE.md

Related to: https://github.com/aquarist-labs/s3gw/issues/361
Signed-off-by: Giuseppe Baccini <[email protected]>
giubacc referenced this issue in giubacc/s3gw Oct 16, 2023
regular-localhost-incremental-fill-5k
regular_localhost_load_fio_64_write
regular_localhost_zeroload_400_800Kdb
regular_localhost_zeroload_emptydb
segfault_localhost_zeroload_emptydb

Related to: https://github.com/aquarist-labs/s3gw/issues/361
Signed-off-by: Giuseppe Baccini <[email protected]>
giubacc referenced this issue in giubacc/s3gw Oct 16, 2023
- scale_deployment_0_1-k3s3nodes-zeroload-emptydb
- s3wl-putobj-100ms-clusterip
- s3wl-putobj-100ms-ingress

Related to: https://github.com/aquarist-labs/s3gw/issues/361
Signed-off-by: Giuseppe Baccini <[email protected]>
@jecluis
Copy link
Contributor

jecluis commented Oct 25, 2023

@giubacc if there are pending tasks in this Epic (which I'm sure there are, at least with regard to documentation), please add them as tasks to this issue.

@jecluis jecluis modified the milestones: v0.22.0, v0.23.0 Oct 25, 2023
@vmoutoussamy vmoutoussamy added the priority/1 Should be fixed for next release label Nov 15, 2023
@giubacc
Copy link

giubacc commented Nov 16, 2023

There is a on-going task (medik8s research) regarding the HA, I think we will have to wait until the definitive path for HA will be set, I'd keep this opened because we will have to write something in the documentation at some point, but now we still don't know exactly what

@jecluis
Copy link
Contributor

jecluis commented Nov 16, 2023

@giubacc please file an issue for the documentation, and whatever you think may be needed to track the effort, and add it to this epic.

@vmoutoussamy vmoutoussamy modified the milestones: v0.23.0, v0.25.0 Nov 27, 2023
@jecluis jecluis added triage/next-candidate This could be moved to the next milestone and removed priority/1 Should be fixed for next release LH 1.6 labels Mar 21, 2024
@jecluis jecluis added this to s3gw Mar 21, 2024
@jecluis jecluis moved this to Backlog in s3gw Mar 21, 2024
@jecluis jecluis removed this from the v0.25.0 milestone Mar 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/kubernetes k8s and related kind/design Design work kind/documentation Improvements or additions to documentation kind/epic Umbrella issue for a group of related issues kind/research Issues that need to be researched triage/next-candidate This could be moved to the next milestone
Projects
Status: Backlog
Development

No branches or pull requests

5 participants