How is data integrity verification handled? #16

kript · 2019-02-22T09:29:48Z

More of a question than a bug report but might turn into a feature request...

TL;DR; how is checksumming expected to work in this plugin, both on upload and using the ichksum command later to verify, given the chunked nature of objects within the resource?

At the moment, as I understand it, an replica of an object is stored in 4MB chunks across the Rados 'bucket'. Therefore, to perform a checksum, the file must be downloaded and reassembled before ichksum can be usefully run against it.

Is that correct? If so, how would ichksum -a be expected to work on a tree with a replication node, meaning that there are more than one copies and one of them is held on the librados back end? Foe that matter, are tools like iscan and ifsck supported?

I can see that irods/irods#2796 would be useful here, but wondering if there were any other thoughts for ways to ensure data integrity without having to read every file back from the bucket!

Cheers

John

The text was updated successfully, but these errors were encountered:

trel · 2020-11-10T03:43:52Z

Saw your link back to here from the SoftIron conversation. The rados plugin itself could 'trust' the storage to provide these types of calculations/values. Another option is to not use the plugin at all, and just use unixfilesystem via CephFS (and perhaps grow a setting that itself... trusts the storage for checksum information).

Otherwise, yes, this is a challenge. And I think you're still ahead of it - we haven't faced this question from others yet, even nearly two years after you posted this.

jasoncoposky · 2020-11-10T04:08:24Z

Within iRODS every checksum computation for every replica is a full read from storage and a compute. We have discussed moving the checksum operation from an RPC API and delegating that to the underlying storage architecture which may provide quicker and better assurances (e.g. erasure coding) that the data is correct at rest. Given that we could rely on assurances from ceph that data is correct given your own configuration of the storage and iRODS.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How is data integrity verification handled? #16

How is data integrity verification handled? #16

kript commented Feb 22, 2019

trel commented Nov 10, 2020

jasoncoposky commented Nov 10, 2020

How is data integrity verification handled? #16

How is data integrity verification handled? #16

Comments

kript commented Feb 22, 2019

trel commented Nov 10, 2020

jasoncoposky commented Nov 10, 2020