You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
More of a question than a bug report but might turn into a feature request...
TL;DR; how is checksumming expected to work in this plugin, both on upload and using the ichksum command later to verify, given the chunked nature of objects within the resource?
At the moment, as I understand it, an replica of an object is stored in 4MB chunks across the Rados 'bucket'. Therefore, to perform a checksum, the file must be downloaded and reassembled before ichksum can be usefully run against it.
Is that correct? If so, how would ichksum -a be expected to work on a tree with a replication node, meaning that there are more than one copies and one of them is held on the librados back end? Foe that matter, are tools like iscan and ifsck supported?
I can see that irods/irods#2796 would be useful here, but wondering if there were any other thoughts for ways to ensure data integrity without having to read every file back from the bucket!
Cheers
John
The text was updated successfully, but these errors were encountered:
Saw your link back to here from the SoftIron conversation. The rados plugin itself could 'trust' the storage to provide these types of calculations/values. Another option is to not use the plugin at all, and just use unixfilesystem via CephFS (and perhaps grow a setting that itself... trusts the storage for checksum information).
Otherwise, yes, this is a challenge. And I think you're still ahead of it - we haven't faced this question from others yet, even nearly two years after you posted this.
Within iRODS every checksum computation for every replica is a full read from storage and a compute. We have discussed moving the checksum operation from an RPC API and delegating that to the underlying storage architecture which may provide quicker and better assurances (e.g. erasure coding) that the data is correct at rest. Given that we could rely on assurances from ceph that data is correct given your own configuration of the storage and iRODS.
More of a question than a bug report but might turn into a feature request...
TL;DR; how is checksumming expected to work in this plugin, both on upload and using the ichksum command later to verify, given the chunked nature of objects within the resource?
At the moment, as I understand it, an replica of an object is stored in 4MB chunks across the Rados 'bucket'. Therefore, to perform a checksum, the file must be downloaded and reassembled before
ichksum
can be usefully run against it.Is that correct? If so, how would
ichksum -a
be expected to work on a tree with a replication node, meaning that there are more than one copies and one of them is held on the librados back end? Foe that matter, are tools likeiscan
andifsck
supported?I can see that irods/irods#2796 would be useful here, but wondering if there were any other thoughts for ways to ensure data integrity without having to read every file back from the bucket!
Cheers
John
The text was updated successfully, but these errors were encountered: