-
Notifications
You must be signed in to change notification settings - Fork 21
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add decision about the HA model to be used with the s3gw with Longhorn
Related to: https://github.com/aquarist-labs/s3gw/issues/361 Signed-off-by: Giuseppe Baccini <[email protected]>
- Loading branch information
Giuseppe Baccini
committed
Oct 2, 2023
1 parent
65f3a52
commit 4ad0cc2
Showing
1 changed file
with
39 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,39 @@ | ||
# s3gw High Availability model | ||
|
||
## Context and Problem Statement | ||
|
||
We analyzed some High Availability - HA - concepts applied to the s3gw when used with Longhorn. | ||
The final aim of the research is to identify an HA model we can reasonably rely on. | ||
|
||
The full HA research work conducted until now can be found here: | ||
[High Availability research](https://github.com/aquarist-labs/s3gw/pull/685). | ||
You can find there all the rationales, motivations and the details about the tests performed. | ||
|
||
## Considered Options | ||
|
||
We identified 3 HA models: | ||
|
||
- **Active/Active** (multiple s3gw instances concurrently serving the same data) | ||
- **Active/Warm Standby** (multiple s3gw instances, one serving data, others able to take over if active instance fails) | ||
- **Active/Standby** (single s3gw instance, with Kubernetes restarting/redeploying as necessary on failure) | ||
|
||
## Decision Outcome | ||
|
||
The 3 aforementioned models have different performances and different implementation efforts. | ||
For our use case, the *Active/Standby* model built on top of Longhorn actually makes | ||
the most sense and brings the "best" HA characteristics relative to implementing a | ||
more fully active/distributed solution. | ||
|
||
List of *desirable* characteristics owned by the *Active/Standby* model | ||
|
||
- Simplicity | ||
- Low implementation effort in respect to the other models | ||
- Expected to work mainly with Kubernetes primitives | ||
- Compatible with RWO persistent volume semantics | ||
- Acceptable restart timings on switch-overs and fail-overs (excluding the non-graceful node failure) | ||
|
||
Be aware that the [non-graceful node failure](https://github.com/aquarist-labs/s3gw/blob/4af657c573ce634cd16c53c20986e54817077b44/docs/research/ha/RATIONALE.md#non-graceful-node-failure) problem cannot be entirely solved with the *Active/Standby* model alone. | ||
Regarding this, we have opened a [dedicated issue](https://github.com/longhorn/longhorn/issues/6803) within the Longhorn project. | ||
|
||
For a more comprehensive explanation about this choice, please refer to the | ||
[High Availability research](https://github.com/aquarist-labs/s3gw/pull/685) pull request. |