Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Document ETag and Precondition Semantics #250

Open
jonjohnsonjr opened this issue Mar 12, 2021 · 11 comments
Open

Proposal: Document ETag and Precondition Semantics #250

jonjohnsonjr opened this issue Mar 12, 2021 · 11 comments
Milestone

Comments

@jonjohnsonjr
Copy link
Contributor

These are just part of HTTP already, so we don't really need to define them (IMO), but it would be nice if clients could rely on these headers.

We discussed this on the dev call a few weeks ago, filing this as a placeholder for a coming pull request to clearly document the expect behavior so that registry implementations can start to follow it.

@jonjohnsonjr
Copy link
Contributor Author

@awakecoding might make sense to continue your ETag discussion here.

@awakecoding
Copy link

@jonjohnsonjr @SteveLasker excellent, let's discuss it here! Here is my primary goal: find a way to keep browser Etag compatibility (opaque value) while also passing a non-opaque Etag-like digest-based value, making it possible to implement caching without the need for associative data attached to a specific file, especially for manifests.

By this I mean being able to correctly recover the Etag-like value associated with a cached manifest file on disk without some additional metadata to store the corresponding Etag. This is very similar to the blobs that are digest-based, except made usable for manifests, and would greatly simplify efficient manifest update checks without uselessly pulling the manifest contents.

The Etag standard HTTP header is opaque by definition. Browsers and HTTP caches implement this mechanism such that if you cache an HTTP response along with the ETag value, you can make the same query again including the ETag value, and avoid receiving the response contents if it hasn't changed. This mechanism is almost perfect aside from the fact that the Etag value is opaque by definition, meaning we can't just declare it to be digest-based, and therefore non-opaque. A proper ETag implementation doesn't try interpreting the ETag value, it just stores it as a tag alongside the cache data.

This being said, ETag values are often digest-based, even though you cannot rely on them being digest-based. Here are the ETag and Docker-Content-Digest headers from a response I captured in Fiddler between the ORAS CLI and Azure Container Registry:

Docker-Content-Digest: sha256:083aa17315658e6c368cd46d58dc9e257c230e29b359c5b92008fe476955e099
Etag: "sha256:083aa17315658e6c368cd46d58dc9e257c230e29b359c5b92008fe476955e099"

For some odd reason, the ETag is the same as the Docker-Content-Digest except that it is within double quotes. Now if you look at these values, it can be easy to think that one may just try interpreting them as digest values anyway, but we would need a standard approach to disambiguation between opaque and non-opaque values passed within the Etag header, so it's still a dangerous thing to try. Even then, a server may not implement non-opaque Etag headers, which would result in constant Etag mismatches if the client keeps rebuilding the value from the content digests. Opaque Etag headers it is for that reason.

Here is my suggestion: use a header very similar to the "Docker-Content-Digest" that works like the ETag header, with the exception that is is defined to non-opaque and digest-based. Implementations supporting this ETag variant would interpret its value to be digest-based. The same value would be inserted into the ETag header, but one wouldn't be able to assume that standard Etag header is digest-based. This would make it possible for implementations to work using only the "OCI-Content-Digest" header where the value is not stored in metadata, but simply recovered by rehashing the local manifest contents.

Any thoughts?

@jonjohnsonjr
Copy link
Contributor Author

jonjohnsonjr commented Mar 24, 2021

For some odd reason, the ETag is the same as the Docker-Content-Digest except that it is within double quotes.

I believe this is a requirement from the RFC.

Here is my suggestion: use a header very similar to the "Docker-Content-Digest" that works like the ETag header, with the exception that is is defined to non-opaque and digest-based. Implementations supporting this ETag variant would interpret its value to be digest-based. The same value would be inserted into the ETag header, but one wouldn't be able to assume that standard Etag header is digest-based. This would make it possible for implementations to work using only the "OCI-Content-Digest" header where the value is not stored in metadata, but simply recovered by rehashing the local manifest contents.

This is somewhat reasonable to me, but I wonder if we can avoid defining a new header. It seems to me that Docker-Content-Digest does serve this purpose, more or less? For certain things (like conditional requests), you would have to hit the registry first to know, for sure, what the registry's Etag would be. We could avoid that roundtrip if the Etag was deterministic and non-opaque, but we'd still require registries to adopt that new header, so you'd still want to perform that roundtrip for safety unless you know the registry implements this behavior.

I would be fine with some SHOULD language around making the Etag value deterministic based on the digest of the content. This would enable the use cases you have in mind for registries that already implement this behavior. I vaguely understand how this might not play nice with some caches and proxies, but I don't see a clear downside to doing it this way. What can go wrong if clients assume Etag: "<digest>"? Is that worse than what we have today?

I could also see something similar to #251 (comment) where we have registries return a header to indicate that they follow this deterministic etag behavior.

@sajayantony
Copy link
Member

  1. Can we also clarify whether ETAG is bound to the pushed manifest (down conversion of manifest from say manifest list to manifest ) will not effect ETAG?
  2. Secondly can we also make sure that this should or should not be assumed to be equivalent once the manifest moves from one location to another. For e.g. moving the manifest from one repo or to another registry is or is not expected to gurantee ETAGs are going to be the same. With Digests this is not an issue (but down conversions are).

@jonjohnsonjr
Copy link
Contributor Author

  1. Can we also clarify whether ETAG is bound to the pushed manifest (down conversion of manifest from say manifest list to manifest ) will not effect ETAG?

I think down-conversion affecting the ETag is fine and it probably should. If you are an old client that gets served a down-converted manifest, you probably should not be mutating the tag to point to a new value based on what was returned to you. As a client, if you're unable to even pull down the manifest in its "pushed" form, modifying it is almost certainly unsafe.

For e.g. moving the manifest from one repo or to another registry is or is not expected to gurantee ETAGs are going to be the same.

I think that's implicit because the Etag header comes from the origin registry, but I would be happy to add some language to make it more explicit that Etags are only valid for the registry that served them?

@awakecoding
Copy link

@jonjohnsonjr aren't Etag headers something that can also be inserted by caching proxies? I wonder if caching proxies can also optionally decide to replace Etag headers with their own if the origin already set an Etag. Because they are by definition opaque it's hard to predict what can be done with them in all cases.

@jonjohnsonjr
Copy link
Contributor Author

aren't Etag headers something that can also be inserted by caching proxies?

https://tools.ietf.org/html/rfc2616?spm=5176.doc32013.2.3.Aimyd7#section-13.5.2

A transparent proxy MUST NOT modify any of the following fields in a
request or response, and it MUST NOT add any of these fields if not
already present:

  • Content-Location

  • Content-MD5

  • ETag

  • Last-Modified

To your point, though, we may want to talk about the Cache-Control header as well? For something like appending to a manifest list, clients would probably want to include Cache-Control: no-cache when interrogating the current value of a manifest by tag.

@sajayantony
Copy link
Member

@sargun - Just checked with ACR and it does do digest based ETAGs. I would good to describe the S3 issue on this thread.

➜ curl -s -v -H \
          -H "Authorization: Bearer $TOKEN" \
          -H 'Accept: application/vnd.oci.image.manifest.v1+json' \
           https://$REGISTRY/v2/$REPO/manifests/$TAG
...
< Docker-Content-Digest: sha256:6c4eb7e726bfbdb44eb2c40747792ce85bc646837c6e5683af8466b2a651cc22
< Docker-Distribution-Api-Version: registry/2.0
< Etag: "sha256:6c4eb7e726bfbdb44eb2c40747792ce85bc646837c6e5683af8466b2a651cc22"

@sudo-bmitch
Copy link
Contributor

The concern raised was a replicated registry server with an S3 backing cannot guarantee the current state of the ETag since another replica of the registry server may be modifying the tag pointer. The next step is to follow up with the distribution maintainers on the feasibility to implement this.

@jdolitsky
Copy link
Member

punting to 1.2

@ecki
Copy link

ecki commented Jun 5, 2022

One of the most important checks a registry client is doing regularly are checking if a tag has changed (all other artifacts are immutable). So having a (HTTP compatible) way to do this should be mentioned explicitly in the spec - especially since there is currently no way (OCI branded header) to do that without “Docker-“ headers.

unfortunately the obvious/clear usage of digest as e-Tag together with HEAD and/or conditional GET it is either specified that you need to cache the Etage value independent from the hash OR define the format of the ETag (which is kind of layering violation).

Personally I am for specifying that a GET+HEAD on a tag must return the digest as ETag and it should support If-None-Match headers on GET.

Also question is is Last-Modified should be mandated.

BTw it’s unfortunate that you can’t see the digest a label points to in the list api (also mentioned here: #320 (comment))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants