v0.23.0
[gardener/etcd-druid]
⚠️ Breaking Changes
[OPERATOR]
Custodian controller has now been removed in favour of etcd status reconciliation handled by etcd controller. CLI flags--custodian-workers
and--custodian-sync-period
have now been removed, and are no longer recognised by etcd-druid. by @unmarshall [#777][OPERATOR]
Labels on druid-managed resources are now streamlined, and no longer includename
andinstance
. Instead, these are now standard labelsapp.kubernetes.io/managed-by
andapp.kubernetes.io/part-of
, as recommended by Kubernetes. Additionally,app.kubernetes.io/component
label is also used to set the type of the component for an etcd cluster. by @unmarshall [#777][OPERATOR]
Creation of Etcd resource no longer requires annotationgardener.cloud/operation: reconcile
to be set on it for etcd-druid to reconcile it. In other words, creation of Etcd resource is immediate, irrespective of whether etcd-spec-auto-reconciliation is enabled or not. by @unmarshall [#777][OPERATOR]
CLI flag--workers
has now been renamed to--etcd-workers
. Additionally, etcd controller also accepts new CLI flagsenable-etcd-spec-auto-reconcile
to control how and when the etcd spec is reconciled, andetcd-status-sync-period
to specify the duration after which an event will be re-queued to ensure etcd status reconciliation. CLI flagignore-operation-annotation
has been deprecated, and will be removed in an upcoming release. by @unmarshall [#777][OPERATOR]
Volume mounts for the etcd StatefulSet have now been fixed, to allow individually specifying TLS secrets for the etcd and backup-restore servers. CA and TLS certificates used for etcd client-server communication, relevant to the container that they are mounted on, can be found at/var/etcd/ssl/
. CA and TLS certificates used for etcd peer communication, relevant to the container that they are mounted on, can be found at/var/etcd/ssl/peer
. CA and TLS certificates used for etcd-backup-restore client-server communication, relevant to the container that they are mounted on, can be found at/var/etcdbr/ssl
. by @unmarshall [#777][DEVELOPER]
Vendor directory has now been removed from the project. Please runmake tidy
to pull dependencies into go mod cache initially, and whenever required. by @shreyas-s-rao [#748][USER]
Before upgrading druid tov0.23.0+
, please ensure that druid is running with at leastv0.22.3+
. This is required to avoid any downtime during the upgrade of the etcds by the new druid version, as well as to ensure backward compatibility of your etcds, in case you wish to downgrade back tov0.22.3+
. by @shreyas-s-rao [#823]
📰 Noteworthy
-
[OPERATOR]
A new conditionDataVolumesReady
has been introduced inetcd.Status
to capture and report PVC warnings. by @unmarshall [#777] -
[OPERATOR]
Annotationdruid.gardener.cloud/ignore-reconciliation
has been marked as deprecated. Please usedruid.gardener.cloud/suspend-etcd-spec-reconcile
instead, which provides the same behavior. by @unmarshall [#777] -
[OPERATOR]
Scale-up logic for single-node etcd clusters with peerTLS disabled to multi-node etcd clusters with peerTLS enabled, has been improved by making it deterministic and eliminates an unnecessary restart of the first etcd member, thus making this process faster and error-free. by @unmarshall [#777] -
[OPERATOR]
CLI flag--leader-election-resource-lock
is now deprecated, and will be set toleases
from a future release onwards. by @unmarshall [#777] -
[OPERATOR]
A new validating webhook namedsentinel
has been introduced to safeguard resources created by etcd-druid. A new annotationdruid.gardener.cloud/disable-etcd-component-protection
has been introduced, which if set, tells sentinel webhook to allow manual changes by an operator on any resource managed by etcd-druid.This webhook is disabled by default, and can be enabled as follows:
- If deploying druid via the binary, please pass CLI flag
--enable-sentinel-webhook
to it. Additionally, CLI flags--webhook-server-bind-address
,--webhook-server-port
and--webhook-server-tls-server-cert-dir
need to be passed when enabling the webhook, which enforces TLS communication using the given certs. - If deploying druid via the Helm charts, please set chart value
webhooks.sentinel.enabled: true
. - If deploying druid via Skaffold, please set environment variable
DRUID_ENABLE_SENTINEL_WEBHOOK=true
. This is also applicable when running Make targets such asdeploy
,deploy-dev
,deploy-debug
,test-e2e
, etc, except forci-e2e-kind
. by @unmarshall [#777]
- If deploying druid via the binary, please pass CLI flag
-
[OPERATOR]
The component model used for deploying resources has now been replaced with a simplifiedResourceOperator
model, found under/internal/operator
. by @unmarshall [#777] -
[OPERATOR]
CLI flag--metrics-addr
is now deprecated. Please use--metrics-bind-address
and--metrics-port
instead. by @unmarshall [#777] -
[USER]
Remove usage of *_STORAGE_API_ENDPOINT` environment variables for Google and Azure providers. Storage API endpoint / domain will instead be directly consumed by etcd-backup-restore from the mounted backup secret. by @shreyas-s-rao [#856] -
[DEVELOPER]
We are moving towards using golang native tests. This also allowed us to relook at the unit and integration tests that we have. In this PR we have only partially introduced comprehensive golang native tests for specific packages (internal/operator
,internal/webhook
,internal/controller/etcd/
andinternal/utils/
). We have also added comprehensive integration tests for etcd controller and the new IT tests are present attest/it/controller/etcd
. In future PRs we will replace the ginkgo based tests and replace it with native golang tests for rest of the packages as well. by @unmarshall [#777] -
[DEVELOPER]
All packages under/pkg
and/controllers
directories have now been moved to new parent/internal
directory. by @unmarshall [#777]
✨ New Features
[OPERATOR]
Etcd resource status now includes fieldLastErrors
to indicate any errors encountered in the last reconciliation of the etcd resource. Custom error codes have been introduced to help capture contextual information from the reconciliation run. by @unmarshall [#777][OPERATOR]
Etcd resource status now includes fieldLastOperation
to indicate the last operation performed on the etcd resource. This includes a uniqueRunID
to help sift through logs containing the specificRunID
, improving debuggability. Every reconciler run generates a uniqueRunID
. by @unmarshall [#777][DEVELOPER]
etcd-druid
now supports end-to-end testing withAzurite
- the Azure Blob Storage Emulator by @renormalize [#753][DEVELOPER]
Builds for non-native platforms can now be done using thedocker-build
make target instead of having to invoke thedocker buildx
command. The platform can be specified using thePLATFORM
variable which is passed while invoking make. by @renormalize [#873][USER]
Added support for new backup store providerstackit
which is an alias forS3
. by @unmarshall [#777]
🏃 Others
-
[OPERATOR]
etcd-backup-restore container was started with SYS_PTRACE linux capability. This prevented creating etcd cluster with Pod Security Standards. This linux capability has now been removed as it is no longer required. by @unmarshall [#777] -
[OPERATOR]
set cpu and memory requests for compaction pods by @anveshreddy18 [#853] -
[OPERATOR]
Etcd pods now mount files withDefaultMode
set to0640
. by @unmarshall [#777] -
[OPERATOR]
Upgradegithub.com/gardener/etcd-backup-restore
dependency from0.26.0
to0.29.0
by @anveshreddy18 [#830] -
[OPERATOR]
1. Dependency version upgrades done to gardener/gardener, controller-runtime, controller-tools, k8s.io/*, logr, zap, ginkgo, uber mock, uuid dependencies.
2. Adapted golanglint-ci recommendations.
3. Removed dependency on gardener/gardener hack/scripts.
by @unmarshall [#834] -
[OPERATOR]
Enhanced parallel execution support in e2e tests, reducing time and improving test suite robustness. by @seshachalam-yv [#833] -
[OPERATOR]
Upgrades to golang version 1.22.4 by @unmarshall [#826] -
[OPERATOR]
Updated e2e tests to support label changes during HA upgrades, preventing the reconciliation process from getting stuck and ensuring smooth transitions in deployment scenarios.by @seshachalam-yv [#838]
-
[OPERATOR]
Introduced new Makefile targets:
deploy-dev
- starts skaffold in dev mode allowing reloading druid upon change.
deploy-debug
- starts skaffold in debug mode allowing using breakpoints to interrupt the control-flow.
undeploy
- uses skaffold delete to delete all resources that are installed via skaffold. by @unmarshall [#777] -
[OPERATOR]
Enabling the configurability of--max-backups
for LimitBasedGC through the etcd resource spec.spec.backup.maxBackupsLimitBasedGC
. by @anveshreddy18 [#755] -
[OPERATOR]
Updated README.md by @unmarshall [#851] -
[DEVELOPER]
Introduced testing guidelines, added developer productivity scripts and make targets to stress test, debug integration tests, formatting and detecting incompatible api changes. by @unmarshall [#857] -
[DEVELOPER]
Fixes unit tests for internal/health package, includes missing tests in the Makefiletest
target and minor refactoring of test utility functions. by @unmarshall [#822] -
[DEVELOPER]
Add Make targetmake docker-clean
for cleaning up all docker builds related to etcd-druid. by @shreyas-s-rao [#842] -
[DEVELOPER]
Add Make targetsmake clean-build-cache
andmake clean-mod-cache
for cleaning up Go build and mod caches respectively. by @shreyas-s-rao [#842] -
[DEVELOPER]
uselocalstack:s3-latest
image instead oflocalstack:latest
in tests for faster setup by @anveshreddy18 [#805] -
[DEVELOPER]
remove deprecated pkgk8s.io/utils/pointer
and use pkgk8s.io/utils/ptr
by @anveshreddy18 [#861]
📖 Documentation
[OPERATOR]
Introduce DEP-05: Design document for operator out-of-band tasks. by @ishan16696 [#757]
[gardener/etcd-backup-restore]
⚠️ Breaking Changes
[USER]
Remove support for specifying Azure custom endpoint via environment variableAZURE_STORAGE_API_ENDPOINT
. Please use the newdomain
field (via JSON or file) instead. by @shreyas-s-rao [gardener/etcd-backup-restore#759]
🏃 Others
[DEVELOPER]
Added support to use Azurite, which emulates Azure Blob Storage for local development and testing - which can be enabled by setting theAZURE_EMULATOR_ENABLED
andAZURITE_STORAGE_API_ENDPOINT
environment variables. by @renormalize [gardener/etcd-backup-restore#699][DEVELOPER]
Fixed thecheck
make target when run locally, and a link in docs/development/new_cp_support.md. by @renormalize [gardener/etcd-backup-restore#754][DEVELOPER]
Added documentation to useetcdbrctl
as a process for testing locally, for a better developer experience and faster iteration speeds. by @renormalize [gardener/etcd-backup-restore#723][DEVELOPER]
AWS S3 client Go module upgraded from v1.32.6 to v1.54.20. by @renormalize [gardener/etcd-backup-restore#755][DEVELOPER]
Improved error handling for OpenStack Swift during deletion of objects. by @renormalize [gardener/etcd-backup-restore#710][DEVELOPER]
Added support for using fake-gcs-server for all etcdbr functionalities. To enable: Either- Set
GOOGLE_EMULATOR_ENABLED
environment variable when runningetcdbrctl
command OR - Set
emulatorEnabled: true
in GCP backup secret when deploying via Helm chart. by @anveshreddy18 [gardener/etcd-backup-restore#697]
- Set
[USER]
Do not rely on the snapshotter state when stopping the snapshotter. The snapshotter will now always be closed when a member goes from being the leader to any other state. by @avestuk [gardener/etcd-backup-restore#680][USER]
Add support for specifying Google storage API endpoint via file~/.gcp/storageAPIEndpoint
. Environment variableGOOGLE_STORAGE_API_ENDPOINT
is deprecated, and will be removed shortly. by @shreyas-s-rao [gardener/etcd-backup-restore#759][USER]
Add support for specifying custom domains for Azure storage. by @shreyas-s-rao [gardener/etcd-backup-restore#759][OPERATOR]
Bump alpine base version for Docker build to3.18.2
. by @shreyas-s-rao [gardener/etcd-backup-restore#638][OPERATOR]
etcd-backup-restore now supports server-side encryption using customer provided keys (SSE-C) for S3-compatible providers by @amold1 [gardener/etcd-backup-restore#719]
Docker Images
- etcd-druid:
europe-docker.pkg.dev/gardener-project/releases/gardener/etcd-druid:v0.23.0