-
Notifications
You must be signed in to change notification settings - Fork 658
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
config: forbid snapshotting of data volumes #504
config: forbid snapshotting of data volumes #504
Conversation
Do we need language clarifying that "copying up" from the layer before into the volume is allowed? |
So you want me to change the previously line, because currently copy-up is forbidden:
|
@cyphar In Docker today, you can create an image with a volume such that it will "copy up" before running. If the new copy is modified, it won't be included in the image. |
this raises the point that "data volumes" are not defined in image-spec (they presumably rely on the Docker concept but don't even link there). Can we add some more verbiage here describing what it means? |
@stevvooe Alright, I'll update the wording to include a note saying that @jonboulle Yeah, I'll add some words about it, but I won't reference the Docker docs because they specifically refer to the Docker implementation. Something like:
|
On Wed, Dec 21, 2016 at 05:10:11AM -0800, Aleksa Sarai wrote:
> Data volumes are directories within the rootfs of an image which
> are intended to store external (possibly persistent) data in the
> rootfs of a container based on the image. Any changes to a data
> volume will not be included in further layers based off the image.
I still prefer focusing on mount points [1], but if we want to go down
the “data volumes” path we probably need:
* To link this sentence's “Data volumes” with the config's ‘Volumes’
if we ever grow a layer generator (e.g. opencontainers/image-tools#8).
* To replace “further layers based off the image” with “the layer”.
The data-volume masking also applies to the initial layer, so
“further” doesn't always apply. And “the image” is a pretty fuzzy
term. The important context is “the updated rootfs, the previous
rootfs snapshot (if any), and the set of data-volume directories”,
but that's a bit of a mouthful ;).
[1]: #496 (comment)
|
This is already done (on
I will use the term "layer changeset based on the image". The term "image" is not fuzzy -- you are creating a layer based on an image. As for the context, I'm not sure what you mean by your comment. |
On Wed, Dec 21, 2016 at 12:32:42PM -0800, Aleksa Sarai wrote:
> * To link this sentence's “Data volumes” with the config's
> ‘Volumes’ if we ever grow a volume generator
> (e.g. opencontainers/image-tools#8).
This is already done, `Config.Volume` is defined as "data volumes.
I expect layer implementors to read layer.md, but:
$ git grep -i 'data\|volume' origin/pr/504 -- layer.md
…no hits…
Since this PR is restricting layer generation, I think it has to land
wording about the new restriction in [1]. And it would be nice if the
config and layer references cross-linked each other to make
discovering the relationship easier.
> * To replace “further layers based off the image” with “the layer”.
> The data-volume masking also applies to the initial layer, so
> “further” doesn't always apply. And “the image” is a pretty fuzzy
> term. The important context is “the updated rootfs, the previous
> rootfs snapshot (if any), and the set of data-volume directories”,
> but that's a bit of a mouthful ;).
I will use the term "layer changeset based on the image". The term
"image" is not fuzzy -- you are creating a layer based on an
image.
Always? Why can't you just create a layer in a vacuum?
The current API in flight with opencontainers/image-tools#8 takes
arguments for the child directory and parent directory. There's no
reference to anything more image-y. If this PR lands, that API will
need a new argument. What will that new argument look like?
‘--config PATH_TO_IMAGE_CONFIG_JSON’? ‘--data-volumes DIR[,DIR...]’?
‘--image WHATEVER_AN_IMAGE_IS’?
As for the context, I'm not sure what you mean by your
comment.
Just “these are the parameters for the layer-generation algorithm”.
I'm not clear on what “an image” is, but I'm pretty sure it has more
information than you need for layer generation.
[1]: https://github.com/opencontainers/image-spec/blob/v1.0.0-rc3/layer.md#determining-changes
|
Maintainers asked me to put it in
You can, but then there's two cases:
So okay, not "always" but the case where you don't have an image configuration is effectively just creating a tar archive... |
On Wed, Dec 21, 2016 at 12:49:41PM -0800, Aleksa Sarai wrote:
> Since this PR is restricting layer generation, I think it has to
> land wording about the new restriction in [1].
Maintainers asked me to put it in `config.md` but I can include it
there too. Problem is that it has a bunch of consequences for
cross-linking.
Which is why I prefer focusing on bind mounts ;). But if we want to
link the config to the layer-generation spec via a “data volumes”
idea, I think we need the cross-linking to make the relationship very
obvious.
1. You have a configuration, but there's no layers. In which case
you just follow the configuration. This is like doing `umoci
init` then `umoci new`.
Does that touch layers at all? If not, how does it relate to this PR?
2. You have no configuration, and you're just creating a tar archive
of a directory. In which case, there's no `Config.Volume` for you
to worry about.
And:
3. You have a set of data volume paths (which you got from somewhere)
and a rootfs and you want to make the initial layer. This is like
tar --exclude …, possibly modulo mount-point stubs.
4. You have a set of data volume paths (which you got from somewhere),
a previous snapshot, and a rootfs and want to make a non-initial
layer. This touches both [1] and the data-volume exclusion you're
proposing here.
[1]: https://github.com/opencontainers/image-spec/blob/v1.0.0-rc3/layer.md#determining-changes
|
You could implement volumes without bindmounts or mounts at all, just by excluding changes to the relevant directory. Should you? Probably not, but defining that a volume must be a mount is a bit too strong, and is also restrictive on implementors. Not to mention that
Both of your examples (3 and 4) are effectively saying "I have an image configuration somewhere that I'm generating a changeset for, but I'm not going to give you the configuration for some reason". I don't understand what the usecase is that you're fighting for this distinction between an image configuration and the process for generating layers. You need to have How about I write a PR with some actual language and then we can discuss that, as opposed to commenting on a single phrase in a paragraph that I wrote off the top of my head (as an example of kind-of what I will write) before going to bed? I mean, ultimately I will probably choose wording that doesn't have "the base image" purely because I would then have to define that as well. But please let me actually write something before you start getting pedantic about wording that I wrote as an example... 😩 |
On Wed, Dec 21, 2016 at 01:37:44PM -0800, Aleksa Sarai wrote:
How about I write a PR with some actual language and then we can
discuss that…
+1
|
@cyphar What is going on with this? |
@stevvooe Has concerns about this potentially conflicting with existing use cases and implementations. |
more from the call yesterday, i feel like this is like trying to pin down the exact flags to use with the |
@vbatts @cyphar How do we move this one forward? After some thought, I think there needs to be clear recommendation about what layers mean, but not necessarily how they are created. Generically, one should mask off volume declarations in the image diff creation process, but that seems like it could be purely up to the user. Perhaps, this PR would work if it was a SHOULD. |
i think i'm inclined to say this should be a SHOULD as well. because as a general rule this is the case, though sure-enough there would be some feature that asks to override and include a default fileset in where the volume would be, so if the volume is not present then the app "does the right thing" or some non-sense, and the MUST would make it non-compliant. |
Alright, I'm going to bump this one with SHOULD so we can get it into |
This is necessary in order to make sure that unpackers all have sane behaviour when it comes to handling image repacking and layer generation. Since data volumes are generally bindmounts from sources external to the image, it is not a good idea to snapshot said data -- and thus we should forbid it. Signed-off-by: Aleksa Sarai <[email protected]>
@cyphar I still believe this language to be too strong. It is imposing requirements on build process that doesn't really exist. OCI doesn't really create any requirements for how one image is related to another. The way this is written, it also opens up the possibility for a Volume to be a file, which hasn't really ever existed. |
True, but we need to encode some semantic meaning in Config.Volumes so that it actually does mask diffs. We could make it only affect extraction but that won't actually help solve the whole "volume containing a secret getting snapshotted" problem which this current PR solves. In particular I want to make it so that the whole idea that " I won't lie, I actually missed this issue when implementing
Personally I don't see that as a negative. 😉 But we could deny that if you really want. Ultimately "data volume" is so ill-specified in this spec I really don't know what an implementation should reasonably do in the absence of prior implementations to compare against. |
@cyphar this language prohibits actual use-cases i'm aware of. Not for copying/commiting FROM a volume, but rather copying TO a volume [vbatts@bananaboat] /tmp/tmp.GhZqUJ$ docker build -t f .
Sending build context to Docker daemon 3.584 kB
Step 1 : FROM fedora
---> 15895ef0b3b2
Step 2 : RUN mkdir /data && touch /data/README
---> Running in 3c2c12dd5864
---> 5b1376684e24
Removing intermediate container 3c2c12dd5864
Step 3 : VOLUME /data
---> Running in 4c5f67eeee33
---> 9a9ec38af080
Removing intermediate container 4c5f67eeee33
Successfully built 9a9ec38af080
[vbatts@bananaboat] /tmp/tmp.GhZqUJ$ docker run -v /data/ --rm -it f sh
sh-4.3# ls /data/
README and this makes a new I realize this is confusing language to define in the spec, but your proposed language prohibits this existing functionality. :- |
The old language does too. If we want to change it we should, but in any case we still need to fix up this wording anyway because the old text is problematic already. |
fair
…On Fri, May 26, 2017 at 10:35 AM Aleksa Sarai ***@***.***> wrote:
this language prohibits actual use-cases i'm aware of
The old language does too. If we want to change it we should, but in any
case we still need to fix up this wording anyway because the old text is
problematic *already*.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#504 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEF6SBfddCrRc5Ee2sFS-w_EpQOqF7tks5r9uMggaJpZM4LQLqA>
.
|
This is (unfortunately) not mandated in the specification[1,2], but in order to avoid accidentally spilling private information into published layers (which is one use of Volumes) we must ignore all layers included in Config.Volumes. In future we should also add some flags or alernative ways of masking paths. [1]: opencontainers/image-spec#496 [2]: opencontainers/image-spec#504 Signed-off-by: Aleksa Sarai <[email protected]>
This is (unfortunately) not mandated in the specification[1,2], but in order to avoid accidentally spilling private information into published layers (which is one use of Volumes) we must ignore all layers included in Config.Volumes. In future we should also add some flags or alernative ways of masking paths. [1]: opencontainers/image-spec#496 [2]: opencontainers/image-spec#504 Signed-off-by: Aleksa Sarai <[email protected]>
Closing in favour of #694. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is necessary in order to make sure that unpackers all have sane
behaviour when it comes to handling image repacking and layer
generation. Since data volumes are generally bindmounts from sources
external to the image, it is not a good idea to snapshot said data --
and thus we should forbid it.
Closes #496
Signed-off-by: Aleksa Sarai [email protected]