-
Notifications
You must be signed in to change notification settings - Fork 109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clarify hashing algorithm used for relatedResource examples #1570
Comments
FWIW, I tried running the algorithm here: https://www.w3.org/TR/controller-document/#example-an-implementation-of-the-general-base-encoding-algorithm-above-in-javascript, but while I get the same result as with this naïve script:
That result does not equate to the targetDigest |
We recently moved the definition of Multihash from Data Integrity to the Controller Document, specifically, this section: https://www.w3.org/TR/controller-document/#multihash Would referring to that section help? |
Hi @msporny, I'm sorry I must be missing something in the trail of information. The digestMultibase of the VC v2 context is defined here: https://www.w3.org/TR/vc-data-model-2.0/#example-use-of-the-relatedresource-and-digestmultibase-properties as As it starts with a So I guess my issue is with the hashing part. In this section https://www.w3.org/TR/vc-data-model-2.0/#defn-relatedResource, it says that SHA-384 should be preferred, so that's what I'm using (SHA2). I get the following byte array when hashing VC V2 context with SHA-384, consistently with 2 different libraries:
But using the DB's library to encode this hash to base64url, I get the following result: I'm missing a piece of the puzzle and I can't figure out what it is. Can you guide me to your result? Thanks |
I expect that the Multihash values are base64-url encoded SHA-256 values, not SHA-384. I'll try to check here in a bit... it might also be that I messed up generating the example since this part of the spec is fairly new and hasn't gotten much review. There is a disagreement in the WG as to whether SHA-384 really should be the suggested target hash value. The digestSRI folks seem to want 384, but many of the others in the WG don't feel like that's necessary given that the energy required to break SHA-256 is the equivalent of repurposing every computer on the planet for billions of years or boiling every ocean on the planet 16,000+ times over. That said, I don't think the VCWG has it in them to continue to debate the point more than it already has. |
Yeah, the test vectors were wrong/old. :( I updated the spec to auto-generate the values instead of depending on humans to do it. The Multibase and Multihash encodings are also now explicitly specified in each example title. The commit that implemented that is here: 04576fa Does that address your concerns @lemoustachiste ? |
@msporny yes I think that's way clearer like this, thanks... ...but I am still unable to find the correct algorithm to produce the same result as to what's displayed... :/. Do you have a link to the script that handles the hashing and the encoding? |
Yes, here's the code that's running in the spec: https://github.com/w3c/respec-vc/blob/main/index.js#L376-L467 There is a chance that there is a bug in my code, so I'm very interested to hear if you're able to reproduce the output. Ideally, we'd need to figure this out by the end of this year to ensure that the correct test vectors are in the specification. Thank you for looking into this, independent verification of these test vectors are very important! |
Hi @msporny, I was able to review what happens in the code and figure out where lies my discrepancies. Here is my assumption flow for verifying a hash of a context file:
I was puzzled for a bit but I figured it's because of white spaces, so I then tried to use What I'm thinking is that those white characters shouldn't necessarily be accounted for as part of the hash as they don't exactly represent significant data as far as JSONLD contexts are concerned. Technically in the browser it would require verifier instances to hold physical files at a predictable path to actually load the data from the file (hence keeping white spaces) and not reading from memory as instantiated by the script file. But in practice the latter is an obvious simpler option, albeit bringing more weight into packages - although a necessary trade-off to prevent various HTTP attacks on context files retrieval. So as to prevent such hashing comparison mistake a normalization should occur before hashing and only the valuable data should be hashed. Maybe just whites pace trimming? I'm not sure if there is a more clever way to do it, such as JSONLD normalization, but for context files. |
@lemoustachiste The hashes provided are for the exact file bytes with the white space, blank lines, and formatting they use. What you are probably looking for are hashes on the context files in JSON Canonicalization Scheme (JCS) RFC 8785 form. For reasons I don't recall there was push back about providing additional hashes for the JCS form. Once the contexts are locked down, it would be straightforward for a trusted third party or special tool to provide JCS based or other hashes. That being said, and not knowing details of what you are doing, their may be an easy solution. If the contexts are pre-loaded, their hashes could be pre-checked, and included in the bundle as well. When checking resource hashes at runtime, if the id and hash match what's in the known list, then no further check is needed when using the pre-loaded data. |
Yes, what @davidlehn says above. Some developers had strong pushback wrt. doing any sort of canonicalization, so we're stuck with what we have now (trying to canonicalize in any way will lead to objections). Yes, it doesn't make much sense and causes a single byte of whitespace to throw the hash off, but that's where we are right now and the approach won't change (because of the threats of objection from this group of developers). I think we've done all we can in this issue. Do you have any objections to closing the issue @lemoustachiste (we're required to close/address all issues before going to the next stage in the standardization process). |
I think the workaround can work but it seems like a sort of unofficial recipe, which I believe could lead to some issues in terms of interoperability and I guess false negatives.
Exactly. As a developer and implementer myself I would have preferred a clear and canonical way to ensure the data matches as expected so that with grounded rules, everyone knows what to expect and how to get there. You can close this issue, but I believe it will come back. Thanks |
We say the following in the spec today:
Does the above match your expectation for "clear and canonical way"? It uses standard tooling available on most every operating system out there. What more than that could we do?
Well, clearly we don't want that either... :) I'm out of ideas on what more we could do here -- if we do canonicalization, people in the WG (and some developers) will object. If we keep what's there, you're saying we're not providing a clear and canonical way (though, I don't know how we could be more clear about it). If you could suggest some spec text, that might help us figure out how to address your concern to your liking? |
In this section: https://www.w3.org/TR/vc-data-model-2.0/#integrity-of-related-resources, the vc v2 context
https://www.w3.org/ns/credentials/v2
has this correspondingdigestMultibase
value:uEres1usWcWCmW7uolIW2uA0CjQ8iRV14eGaZStJL73Vz
I am trying to implement logic in my verifier to check the assertion, however using
js-multiformat
(https://www.npmjs.com/package/multiformats) I cannot find the correct steps to manage the same result.Is the hashing algorithm for contexts and other resources detailed somewhere (https://www.w3.org/TR/controller-document/#algorithms maybe)? Or better yet, is there an available library to do this conversions?
All in all I think adding a link to the expected algorithm in the vc-model spec would greatly help implementers konw how to get back to expected values.
Thanks
The text was updated successfully, but these errors were encountered: