-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix triggering replication when multiple destinations are set #331
base: development/1.15
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -114,16 +114,19 @@ class ReplicationStatusUpdater { | |
* Determines if an object should be updated based on its replication metadata properties. | ||
* @private | ||
* @param {ObjectMD} objMD - The metadata of the object. | ||
* @param {string} site - The destination site name. | ||
* @returns {boolean} True if the object should be updated. | ||
*/ | ||
_objectShouldBeUpdated(objMD) { | ||
_objectShouldBeUpdated(objMD, site) { | ||
return this.replicationStatusToProcess.some(filter => { | ||
if (filter === 'NEW') { | ||
// Either site specific replication info is missing | ||
// or are initialized with empty fields. | ||
return (!objMD.getReplicationInfo() | ||
|| objMD.getReplicationInfo().status === ''); | ||
|| !objMD.getReplicationSiteStatus(site)); | ||
} | ||
return (objMD.getReplicationInfo() | ||
&& objMD.getReplicationInfo().status === filter); | ||
&& objMD.getReplicationSiteStatus(site) === filter); | ||
}); | ||
} | ||
|
||
|
@@ -172,36 +175,54 @@ class ReplicationStatusUpdater { | |
// codebase easier to maintain and upgrade, as opposed to having multiple branches or versions of | ||
// the code for different schema versions. | ||
objMD = new ObjectMD(JSON.parse(mdRes.Body)); | ||
if (!this._objectShouldBeUpdated(objMD)) { | ||
if (!this._objectShouldBeUpdated(objMD, storageClass)) { | ||
skip = true; | ||
return process.nextTick(next); | ||
} | ||
// Initialize replication info, if missing | ||
// This is particularly important if the object was created before | ||
// enabling replication on the bucket. | ||
if (!objMD.getReplicationInfo() | ||
|| !objMD.getReplicationSiteStatus(storageClass)) { | ||
let replicationInfo = objMD.getReplicationInfo(); | ||
if (!replicationInfo || !replicationInfo.status) { | ||
const { Rules, Role } = repConfig; | ||
const destination = Rules[0].Destination.Bucket; | ||
// set replication properties | ||
const ops = objMD.getContentLength() === 0 ? ['METADATA'] | ||
: ['METADATA', 'DATA']; | ||
const backends = [{ | ||
site: storageClass, | ||
status: 'PENDING', | ||
dataStoreVersionId: '', | ||
}]; | ||
const replicationInfo = { | ||
replicationInfo = { | ||
status: 'PENDING', | ||
backends, | ||
content: ops, | ||
backends: [], | ||
destination, | ||
storageClass, | ||
storageClass: '', | ||
role: Role, | ||
storageType: this.storageType, | ||
storageType: '', | ||
Comment on lines
+197
to
+199
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. what are these 2 fields ( There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. that is how these fields are used... my question is really what information do they actually store, what these fields represent (for example, the storageClass of the object is already known, and stored in same for There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. the code which initializes these in cloudserver:
--> it seems these fields (along with There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
In Zenko, these fields are duplicates of information we already have, as backbeat/cloudserver have a list of all location information, we could just use the storage class stored in the rules to get the info we want. I think these are more of a relic from S3C that we can't really remove right now as S3C uses them. In CRR, the role is also a list. I don't think this works in Zenko tho. |
||
}; | ||
objMD.setReplicationInfo(replicationInfo); | ||
} | ||
// Update replication info with site specific info | ||
if (objMD.getReplicationSiteStatus(storageClass) === undefined) { | ||
// When replicating to multiple destinations, | ||
// the storageClass and storageType properties | ||
// become comma-separated lists of the storage | ||
// classes and types of the replication destinations. | ||
const storageClasses = objMD.getReplicationStorageClass() | ||
? `${objMD.getReplicationStorageClass()},${storageClass}` : storageClass; | ||
objMD.setReplicationStorageClass(storageClasses); | ||
if (this.storageType) { | ||
const storageTypes = objMD.getReplicationStorageType() | ||
? `${objMD.getReplicationStorageType()},${this.storageType}` : this.storageType; | ||
objMD.setReplicationStorageType(storageTypes); | ||
} | ||
// Add site to the list of replication backends | ||
const backends = objMD.getReplicationBackends(); | ||
backends.push({ | ||
site: storageClass, | ||
status: 'PENDING', | ||
dataStoreVersionId: '', | ||
}); | ||
objMD.setReplicationBackends(backends); | ||
} | ||
|
||
objMD.setReplicationSiteStatus(storageClass, 'PENDING'); | ||
objMD.setReplicationStatus('PENDING'); | ||
|
@@ -273,13 +294,16 @@ class ReplicationStatusUpdater { | |
}), | ||
(repConfig, next) => { | ||
const { Rules } = repConfig; | ||
const storageClass = Rules[0].Destination.StorageClass || this.siteName; | ||
const storageClass = this.siteName || Rules[0].Destination.StorageClass; | ||
if (!storageClass) { | ||
const errMsg = 'missing SITE_NAME environment variable, must be set to' | ||
+ ' the value of "site" property in the CRR configuration'; | ||
this.log.error(errMsg); | ||
return next(new Error(errMsg)); | ||
} | ||
if (!this.siteName) { | ||
this.log.warn(`missing SITE_NAME environment variable, triggering replication to the ${storageClass} storage class`); | ||
} | ||
return eachLimit(versions, this.workers, (i, apply) => { | ||
const { Key, VersionId } = i; | ||
this._markObjectPending(bucket, Key, VersionId, storageClass, repConfig, apply); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this may not be correct, if there is more than 1 desitination....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
crrExistingObjects
doesn't currently support triggering replication for multiple sites at once. So this would work as it will initialize the replication info when triggering the first site and then append the other sites' info to the fields when the other ones are triggered.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
even without triggering multiple sites at once : if a user requests replicatoin to the second "site", this will initialize with the 1st destination (--> trigger replication!), then we will add the second one...
When there is no replicationInfo (i.e. typically when it is empty), should we not just initialize an empty replicationInfo, and let the next block ("Update replication info with site specific info") fill the details for the requested destination?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in cloudserver, it seems this is initialized to
bucketMD.replicationConfig.destination
: should we do the same?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a weird one... So, we currently only support having a single destination bucket for all replication rules of a bucket. When creating the rules/workflows via UI it's even worse, the destination bucket becomes the name of the bucket we are replicating from.
The value of this field is not used in Zenko. When creating a location, Cloudserver initializes a client class for the respective backend (aws/azure/...) that keeps the name of the destination bucket in memory, that's the value we use when replicating and not what's in the replication rule (only the storageClass is used to know which client to use).