Skip to content

Commit

Permalink
Add Container level WORM immutable object support for Azure Blob St…
Browse files Browse the repository at this point in the history
…orage. (#781)

* Skip tagged blobs, and add retention expiry data in the snapshot for ABS.

* Fix existing unit tests for ABS.

* Enhance `List` to be able to optionally include tagged snapshots.

* `List` is enhanced to optionally return tagged snapshots.

* Unit tests for tagged snapshots added.

* Update docs to convey immutable snapshot support in Azure Blob Storage.

* Add documentation about ignoring snapshots from restoration through the `"x-etcd-snapshot-exclude"="true"` tag.

* Handle Immutability error responses from the Azure Blob Storage API.
  • Loading branch information
renormalize authored Nov 25, 2024
1 parent 1474ae7 commit 1936760
Show file tree
Hide file tree
Showing 7 changed files with 139 additions and 25 deletions.
39 changes: 38 additions & 1 deletion docs/usage/immutable_snapshots.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,13 +6,15 @@ Several cloud providers offer functionality to create immutable objects within t

Currently, etcd-backup-restore supports the use of immutable objects on the following cloud platforms:

- Google Cloud Storage (currently supported)
- Google Cloud Storage
- Azure Blob Storage

## Enabling and using Immutable Snapshots with etcd-backup-restore

Etcd-backup-restore supports immutable objects, typically at what cloud providers call the "bucket level." During the creation of a bucket, it is configured to render objects immutable for a specific duration from the moment of their upload. This feature can be enabled through:

- **Google Cloud Storage**: [Bucket Lock](https://cloud.google.com/storage/docs/bucket-lock)
- **Azure Blob Storage**: [Container-level WORM Policies](https://learn.microsoft.com/en-us/azure/storage/blobs/immutable-container-level-worm-policies)

It is also possible to enable immutability retroactively by making appropriate API calls to your cloud provider, allowing the immutable snapshots feature to be used with existing buckets. For information on such configurations, please refer to your cloud provider's documentation.

Expand All @@ -29,3 +31,38 @@ Therefore, it is advisable to configure your garbage collection policies based o
## Storage Considerations

Making objects immutable for extended periods can increase storage costs since these objects cannot be removed once uploaded. Storing outdated snapshots beyond their utility does not significantly enhance recovery capabilities. Therefore, consider all factors before enabling immutability for buckets, as this feature is irreversible once set by cloud providers.

## Ignoring Snapshots From Restoration

There might be certain cases where operators would like `etcd-backup-restore` to ignore particular snapshots present in the object store during restoration of etcd's data-dir.
When snapshots were mutable, operators could simply delete these snapshots, and the restoration that follows this would not include them.
Once immutability is turned on, however, it would not be possible to do this.

Various cloud providers provide functionality to add custom annotations/tags to objects to add additional information to objects. These additional annotations/tags are orthogonal to the object's metadata, and therefore do not affect the object itself. This feature is thus available for objects which are immutable as well.

We leverage this feature to signal to etcd-backup-restore to not consider certain snapshots during restoration.
The annotation/tag that is to be added to a snapshot for this is `x-etcd-snapshot-exclude=true`.

You can add these tags through for the following providers like so:

- **Google Cloud Storage**: as specified in the [docs](https://cloud.google.com/sdk/gcloud/reference/storage/objects/update?hl=en). (GCS calls this Custom Metadata).

```sh
gcloud storage objects update gs://bucket/your-snapshot --custom-metadata=x-etcd-snapshot-exclude=true
```

or:

Use the Google Cloud Console to add custom metadata to the object in the `Custom metadata` section of the object.

- **Azure Blob Storage**: as specified in the [docs](https://learn.microsoft.com/en-us/cli/azure/storage/blob/tag?view=azure-cli-latest#az-storage-blob-tag-set). (ABS calls this tags).

```sh
az storage blob tag set --container-name your-container --name your-snapshot --tags "x-etcd-snapshot-exclude"="true"
```

or

Use the Azure Portal to add the tag in the `Blob index tags` section of the blob.

Once these annotations/tags are added, etcd-backup-restore will ignore those snapshots during restoration.
24 changes: 20 additions & 4 deletions pkg/snapshot/snapshotter/garbagecollector.go
Original file line number Diff line number Diff line change
Expand Up @@ -153,7 +153,11 @@ func (ssr *Snapshotter) RunGarbageCollector(stopCh <-chan struct{}) {
continue
}
ssr.logger.Infof("GC: Deleting old full snapshot: %s %v", nextSnap.CreatedOn.UTC(), deleteSnap)
if err := ssr.store.Delete(*nextSnap); err != nil {
if err := ssr.store.Delete(*nextSnap); errors.Is(err, brtypes.ErrSnapshotDeleteFailDueToImmutability) {
// The snapshot is still immutable, attempt to gargbage collect it in the next run
ssr.logger.Warnf("GC: Skipping the snapshot: %s, since it is still immutable", nextSnap.SnapName)
continue
} else if err != nil {
ssr.logger.Warnf("GC: Failed to delete snapshot %s: %v", path.Join(nextSnap.SnapDir, nextSnap.SnapName), err)
metrics.SnapshotterOperationFailure.With(prometheus.Labels{metrics.LabelError: err.Error()}).Inc()
metrics.GCSnapshotCounter.With(prometheus.Labels{metrics.LabelKind: brtypes.SnapshotKindFull, metrics.LabelSucceeded: metrics.ValueSucceededFalse}).Inc()
Expand All @@ -178,7 +182,11 @@ func (ssr *Snapshotter) RunGarbageCollector(stopCh <-chan struct{}) {
snap := snapList[fullSnapshotIndexList[fullSnapshotIndex]]
snapPath := path.Join(snap.SnapDir, snap.SnapName)
ssr.logger.Infof("GC: Deleting old full snapshot: %s", snapPath)
if err := ssr.store.Delete(*snap); err != nil {
if err := ssr.store.Delete(*snap); errors.Is(err, brtypes.ErrSnapshotDeleteFailDueToImmutability) {
// The snapshot is still immutable, attempt to gargbage collect it in the next run
ssr.logger.Warnf("GC: Skipping the snapshot: %s, since it is still immutable", snapPath)
continue
} else if err != nil {
ssr.logger.Warnf("GC: Failed to delete snapshot %s: %v", snapPath, err)
metrics.SnapshotterOperationFailure.With(prometheus.Labels{metrics.LabelError: err.Error()}).Inc()
metrics.GCSnapshotCounter.With(prometheus.Labels{metrics.LabelKind: brtypes.SnapshotKindFull, metrics.LabelSucceeded: metrics.ValueSucceededFalse}).Inc()
Expand Down Expand Up @@ -232,7 +240,11 @@ func (ssr *Snapshotter) GarbageCollectChunks(snapList brtypes.SnapList) (int, br
continue
}
ssr.logger.Infof("GC: Deleting chunk for old snapshot: %s", snapPath)
if err := ssr.store.Delete(*snap); err != nil {
if err := ssr.store.Delete(*snap); errors.Is(err, brtypes.ErrSnapshotDeleteFailDueToImmutability) {
// The snapshot is still immutable, attempt to gargbage collect it in the next run
ssr.logger.Warnf("GC: Skipping the snapshot: %s, since it is still immutable", snapPath)
continue
} else if err != nil {
ssr.logger.Warnf("GC: Failed to delete chunk %s: %v", snapPath, err)
metrics.SnapshotterOperationFailure.With(prometheus.Labels{metrics.LabelError: err.Error()}).Inc()
metrics.GCSnapshotCounter.With(prometheus.Labels{metrics.LabelKind: brtypes.SnapshotKindChunk, metrics.LabelSucceeded: metrics.ValueSucceededFalse}).Inc()
Expand Down Expand Up @@ -269,7 +281,11 @@ func (ssr *Snapshotter) GarbageCollectDeltaSnapshots(snapStream brtypes.SnapList
ssr.logger.Infof("GC: Skipping the snapshot: %s, since its immutability period hasn't expired yet", snapPath)
continue
}
if err := ssr.store.Delete(*snapStream[i]); err != nil {
if err := ssr.store.Delete(*snapStream[i]); errors.Is(err, brtypes.ErrSnapshotDeleteFailDueToImmutability) {
// The snapshot is still immutable, attempt to gargbage collect it in the next run
ssr.logger.Warnf("GC: Skipping the snapshot: %s, since it is still immutable", snapPath)
continue
} else if err != nil {
errorCount++
ssr.logger.Warnf("GC: Failed to delete snapshot %s: %v", snapPath, err)
metrics.SnapshotterOperationFailure.With(prometheus.Labels{metrics.LabelError: err.Error()}).Inc()
Expand Down
39 changes: 30 additions & 9 deletions pkg/snapstore/abs_snapstore.go
Original file line number Diff line number Diff line change
Expand Up @@ -220,7 +220,6 @@ func absCredentialsFromJSON(jsonData []byte) (*absCredentials, error) {

return absConfig, nil
}

func readABSCredentialFiles(dirname string) (*absCredentials, error) {
absConfig := &absCredentials{}

Expand Down Expand Up @@ -284,31 +283,51 @@ func (a *ABSSnapStore) Fetch(snap brtypes.Snapshot) (io.ReadCloser, error) {
}

// List will return sorted list with all snapshot files on store.
func (a *ABSSnapStore) List(_ bool) (brtypes.SnapList, error) {
func (a *ABSSnapStore) List(includeAll bool) (brtypes.SnapList, error) {
prefixTokens := strings.Split(a.prefix, "/")
// Last element of the tokens is backup version
// Consider the parent of the backup version level (Required for Backward Compatibility)
prefix := path.Join(strings.Join(prefixTokens[:len(prefixTokens)-1], "/"))
var snapList brtypes.SnapList

// Prefix is compulsory here, since the container could potentially be used by other instances of etcd-backup-restore
pager := a.client.NewListBlobsFlatPager(&container.ListBlobsFlatOptions{Prefix: &prefix})
pager := a.client.NewListBlobsFlatPager(&container.ListBlobsFlatOptions{Prefix: &prefix,
Include: container.ListBlobsInclude{
Metadata: true,
Tags: true,
Versions: true,
ImmutabilityPolicy: true,
},
})
for pager.More() {
resp, err := pager.NextPage(context.Background())
if err != nil {
return nil, fmt.Errorf("failed to list the blobs, error: %w", err)
}

blob:
for _, blobItem := range resp.Segment.BlobItems {
// process the blobs returned in the result segment
if strings.Contains(*blobItem.Name, backupVersionV1) || strings.Contains(*blobItem.Name, backupVersionV2) {
//the blob may contain the full path in its name including the prefix
blobName := strings.TrimPrefix(*blobItem.Name, prefix)
s, err := ParseSnapshot(path.Join(prefix, blobName))
snapshot, err := ParseSnapshot(*blobItem.Name)
if err != nil {
logrus.Warnf("Invalid snapshot found. Ignoring it:%s\n", *blobItem.Name)
logrus.Warnf("Invalid snapshot found. Ignoring: %s", *blobItem.Name)
} else {
snapList = append(snapList, s)
// Tagged snapshots are not listed when excluded, e.g. during restoration
if blobItem.BlobTags != nil {
for _, tag := range blobItem.BlobTags.BlobTagSet {
// skip this blob
if !includeAll && (*tag.Key == brtypes.ExcludeSnapshotMetadataKey && *tag.Value == "true") {
logrus.Infof("Ignoring snapshot %s due to the exclude tag %q in the snapshot metadata", snapshot.SnapName, *tag.Key)
continue blob
}
}
}
// nil check only necessary for Azurite
if blobItem.Properties.ImmutabilityPolicyExpiresOn != nil {
snapshot.ImmutabilityExpiryTime = *blobItem.Properties.ImmutabilityPolicyExpiresOn
}
snapList = append(snapList, snapshot)
}
}
}
Expand Down Expand Up @@ -442,7 +461,9 @@ func (a *ABSSnapStore) blockUploader(wg *sync.WaitGroup, stopCh <-chan struct{},
func (a *ABSSnapStore) Delete(snap brtypes.Snapshot) error {
blobName := path.Join(snap.Prefix, snap.SnapDir, snap.SnapName)
blobClient := a.client.NewBlockBlobClient(blobName)
if _, err := blobClient.Delete(context.Background(), nil); err != nil {
if _, err := blobClient.Delete(context.Background(), nil); bloberror.HasCode(err, bloberror.BlobImmutableDueToPolicy) {
return fmt.Errorf("failed to delete blob %s due to immutability: %w, with provider error: %w", blobName, brtypes.ErrSnapshotDeleteFailDueToImmutability, err)
} else if err != nil {
return fmt.Errorf("failed to delete blob %s with error: %w", blobName, err)
}
return nil
Expand Down
37 changes: 33 additions & 4 deletions pkg/snapstore/abs_snapstore_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -19,12 +19,14 @@ import (
"github.com/Azure/azure-sdk-for-go/sdk/storage/azblob/blockblob"
"github.com/Azure/azure-sdk-for-go/sdk/storage/azblob/container"
"github.com/gardener/etcd-backup-restore/pkg/snapstore"
"k8s.io/utils/ptr"
)

type fakeABSContainerClient struct {
objects map[string]*[]byte
prefix string
mutex sync.Mutex
objects map[string]*[]byte
objectTags map[string]map[string]string
prefix string
mutex sync.Mutex
// a map of blobClients so new clients created to a particular blob refer to the same blob
blobClients map[string]*fakeBlockBlobClient
}
Expand All @@ -39,6 +41,16 @@ func (c *fakeABSContainerClient) NewListBlobsFlatPager(o *container.ListBlobsFla
}
}

blobTagSetMap := make(map[string][]*container.BlobTag)
for blobName, blobTags := range c.objectTags {
for key, value := range blobTags {
blobTagSetMap[blobName] = append(blobTagSetMap[blobName], &container.BlobTag{
Key: ptr.To(key),
Value: ptr.To(value),
})
}
}

// keeps count of which page was last returned
index, count := 0, len(names)

Expand All @@ -48,7 +60,15 @@ func (c *fakeABSContainerClient) NewListBlobsFlatPager(o *container.ListBlobsFla
},
// Return one page for each blob
Fetcher: func(_ context.Context, page *container.ListBlobsFlatResponse) (container.ListBlobsFlatResponse, error) {
blobItems := []*container.BlobItem{{Name: &names[index]}}
blobItems := []*container.BlobItem{
{
Name: &names[index],
Properties: &container.BlobProperties{},
BlobTags: &container.BlobTags{
BlobTagSet: blobTagSetMap[names[index]],
},
},
}
index++
return container.ListBlobsFlatResponse{
ListBlobsFlatSegmentResponse: container.ListBlobsFlatSegmentResponse{
Expand Down Expand Up @@ -82,6 +102,7 @@ func (c *fakeABSContainerClient) NewBlockBlobClient(blobName string) snapstore.A
c.blobClients[blobName] = &fakeBlockBlobClient{name: blobName,
deleteFn: func() {
delete(c.objects, blobName)
delete(c.objectTags, blobName)
},
checkExistenceFn: func() bool {
_, ok := c.objects[blobName]
Expand All @@ -98,6 +119,14 @@ func (c *fakeABSContainerClient) NewBlockBlobClient(blobName string) snapstore.A
return c.blobClients[blobName]
}

func (c *fakeABSContainerClient) setTags(taggedSnapshotName string, tagMap map[string]string) {
c.objectTags[taggedSnapshotName] = tagMap
}

func (c *fakeABSContainerClient) deleteTags(taggedSnapshotName string) {
delete(c.objectTags, taggedSnapshotName)
}

type fakeBlockBlobClient struct {
name string
staging map[string][]byte
Expand Down
2 changes: 1 addition & 1 deletion pkg/snapstore/gcs_snapstore.go
Original file line number Diff line number Diff line change
Expand Up @@ -305,7 +305,7 @@ func (s *GCSSnapStore) List(includeAll bool) (brtypes.SnapList, error) {

// Check if the snapshot should be ignored
if !includeAll && attr.Metadata[brtypes.ExcludeSnapshotMetadataKey] == "true" {
logrus.Infof("Ignoring snapshot due to exclude tag %q present in metadata on snapshot: %s", brtypes.ExcludeSnapshotMetadataKey, attr.Name)
logrus.Infof("Ignoring snapshot %s due to the exclude tag %q in the snapshot metadata", brtypes.ExcludeSnapshotMetadataKey, attr.Name)
continue
}
attrs = append(attrs, attr)
Expand Down
18 changes: 12 additions & 6 deletions pkg/snapstore/snapstore_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,7 @@ var _ = Describe("Save, List, Fetch, Delete from mock snapstore", func() {
snap5 brtypes.Snapshot
snapstores map[string]testSnapStore
gcsClient *mockGCSClient
absClient *fakeABSContainerClient
)

BeforeEach(func() {
Expand Down Expand Up @@ -116,6 +117,13 @@ var _ = Describe("Save, List, Fetch, Delete from mock snapstore", func() {
objectTags: make(map[string]map[string]string),
}

absClient = &fakeABSContainerClient{
objects: objectMap,
prefix: prefixV2,
blobClients: make(map[string]*fakeBlockBlobClient),
objectTags: make(map[string]map[string]string),
}

snapstores = map[string]testSnapStore{
brtypes.SnapstoreProviderS3: {
SnapStore: NewS3FromClient(bucket, prefixV2, "/tmp", 5, brtypes.MinChunkSize, &mockS3Client{
Expand All @@ -130,11 +138,7 @@ var _ = Describe("Save, List, Fetch, Delete from mock snapstore", func() {
objectCountPerSnapshot: 3,
},
brtypes.SnapstoreProviderABS: {
SnapStore: NewABSSnapStoreFromClient(bucket, prefixV2, "/tmp", 5, brtypes.MinChunkSize, &fakeABSContainerClient{
objects: objectMap,
prefix: prefixV2,
blobClients: make(map[string]*fakeBlockBlobClient),
}),
SnapStore: NewABSSnapStoreFromClient(bucket, prefixV2, "/tmp", 5, brtypes.MinChunkSize, absClient),
objectCountPerSnapshot: 1,
},
brtypes.SnapstoreProviderGCS: {
Expand Down Expand Up @@ -321,8 +325,10 @@ var _ = Describe("Save, List, Fetch, Delete from mock snapstore", func() {
switch provider {
case brtypes.SnapstoreProviderGCS:
mockClient = gcsClient
case brtypes.SnapstoreProviderABS:
mockClient = absClient
}
if provider == brtypes.SnapstoreProviderGCS {
if provider == brtypes.SnapstoreProviderGCS || provider == brtypes.SnapstoreProviderABS {
// the tagged snapshot should not be returned by the List() call
taggedSnapshot := snapList[0]
taggedSnapshotName := path.Join(taggedSnapshot.Prefix, taggedSnapshot.SnapDir, taggedSnapshot.SnapName)
Expand Down
5 changes: 5 additions & 0 deletions pkg/types/snapstore.go
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,11 @@ const (
ExcludeSnapshotMetadataKey = "x-etcd-snapshot-exclude"
)

var (
// ErrSnapshotDeleteFailDueToImmutability is the error returned when the Delete call fails due to immutability
ErrSnapshotDeleteFailDueToImmutability = fmt.Errorf("ErrSnapshotDeleteFailDueToImmutability")
)

// SnapStore is the interface to be implemented for different
// storage backend like local file system, S3, ABS, GCS, Swift, OSS, ECS etc.
// Only purpose of these implementation to provide CPI layer to
Expand Down

0 comments on commit 1936760

Please sign in to comment.