Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

config: forbid snapshotting of data volumes #504

Closed
wants to merge 1 commit into from
Closed

config: forbid snapshotting of data volumes #504

wants to merge 1 commit into from

Conversation

cyphar
Copy link
Member

@cyphar cyphar commented Dec 18, 2016

This is necessary in order to make sure that unpackers all have sane
behaviour when it comes to handling image repacking and layer
generation. Since data volumes are generally bindmounts from sources
external to the image, it is not a good idea to snapshot said data --
and thus we should forbid it.

Closes #496
Signed-off-by: Aleksa Sarai [email protected]

@stevvooe
Copy link
Contributor

Do we need language clarifying that "copying up" from the layer before into the volume is allowed?

@cyphar
Copy link
Member Author

cyphar commented Dec 20, 2016

So you want me to change the previously line, because currently copy-up is forbidden:

If a file or folder exists within the image with the same path as a data volume, that file or folder is replaced with the data volume and is never merged.

@stevvooe
Copy link
Contributor

@cyphar In Docker today, you can create an image with a volume such that it will "copy up" before running. If the new copy is modified, it won't be included in the image.

@jonboulle
Copy link
Contributor

this raises the point that "data volumes" are not defined in image-spec (they presumably rely on the Docker concept but don't even link there). Can we add some more verbiage here describing what it means?

@cyphar
Copy link
Member Author

cyphar commented Dec 21, 2016

@stevvooe Alright, I'll update the wording to include a note saying that copyup is permitted by a runtime (though I don't like that we talk about runtimes here...), just that diff generation is not permitted. The thing I don't like about this is that it's a very runtime-specific thing, and honestly I don't like how Docker has two different volume modes (--volumes-from and -v).

@jonboulle Yeah, I'll add some words about it, but I won't reference the Docker docs because they specifically refer to the Docker implementation. Something like:

Data volumes are directories within the rootfs of an image which are intended to store external (possibly persistent) data in the rootfs of a container based on the image. Any changes to a data volume will not be included in further layers based off the image.

@wking
Copy link
Contributor

wking commented Dec 21, 2016 via email

@cyphar
Copy link
Member Author

cyphar commented Dec 21, 2016

@wking

I still prefer focusing on mount points [1], but if we want to go down
the “data volumes” path we probably need:

This is already done (on master), Config.Volume is defined as "data volumes".

  • To replace “further layers based off the image” with “the layer”.
    The data-volume masking also applies to the initial layer, so
    “further” doesn't always apply. And “the image” is a pretty fuzzy
    term. The important context is “the updated rootfs, the previous
    rootfs snapshot (if any), and the set of data-volume directories”,
    but that's a bit of a mouthful ;).

I will use the term "layer changeset based on the image". The term "image" is not fuzzy -- you are creating a layer based on an image. As for the context, I'm not sure what you mean by your comment.

@wking
Copy link
Contributor

wking commented Dec 21, 2016 via email

@cyphar
Copy link
Member Author

cyphar commented Dec 21, 2016

Since this PR is restricting layer generation, I think it has to land wording about the new restriction in [1].

Maintainers asked me to put it in config.md but I can include it there too. Problem is that it has a bunch of consequences for cross-linking.

Always? Why can't you just create a layer in a vacuum?

You can, but then there's two cases:

  1. You have a configuration, but there's no layers. In which case you just follow the configuration. This is like doing umoci init then umoci new.

  2. You have no configuration, and you're just creating a tar archive of a directory. In which case, there's no Config.Volume for you to worry about.

So okay, not "always" but the case where you don't have an image configuration is effectively just creating a tar archive...

@wking
Copy link
Contributor

wking commented Dec 21, 2016 via email

@cyphar
Copy link
Member Author

cyphar commented Dec 21, 2016

Which is why I prefer focusing on bind mounts ;).

You could implement volumes without bindmounts or mounts at all, just by excluding changes to the relevant directory. Should you? Probably not, but defining that a volume must be a mount is a bit too strong, and is also restrictive on implementors.

Not to mention that umoci (as an example) won't use bindmounts because they require either running containers or having root. So there's an example of an existing implementation where bindmount wording won't work.

And:

Both of your examples (3 and 4) are effectively saying "I have an image configuration somewhere that I'm generating a changeset for, but I'm not going to give you the configuration for some reason". I don't understand what the usecase is that you're fighting for this distinction between an image configuration and the process for generating layers. You need to have DiffIDs in order to extract the layers (well it's actually Layers in the manifest), you need to have Volume in order to know which directories are to be ignored.

How about I write a PR with some actual language and then we can discuss that, as opposed to commenting on a single phrase in a paragraph that I wrote off the top of my head (as an example of kind-of what I will write) before going to bed? I mean, ultimately I will probably choose wording that doesn't have "the base image" purely because I would then have to define that as well. But please let me actually write something before you start getting pedantic about wording that I wrote as an example... 😩

@wking
Copy link
Contributor

wking commented Dec 21, 2016 via email

@philips
Copy link
Contributor

philips commented Jan 18, 2017

@cyphar What is going on with this?

@RobDolinMS
Copy link
Collaborator

@stevvooe Has concerns about this potentially conflicting with existing use cases and implementations.

@vbatts
Copy link
Member

vbatts commented Jan 19, 2017

more from the call yesterday, i feel like this is like trying to pin down the exact flags to use with the find command (i.e which files to match on; whether to stay on the device or traverse mounts; etc). If there is verbiage added, it should be more instructive like "it's recommended to not include child content from volume paths in an image layer". Which is still a little confusing, and should be clarified that it does not mean that content can still be copied from the volume to facilitate the building of the layer.

@stevvooe
Copy link
Contributor

@vbatts @cyphar How do we move this one forward?

After some thought, I think there needs to be clear recommendation about what layers mean, but not necessarily how they are created. Generically, one should mask off volume declarations in the image diff creation process, but that seems like it could be purely up to the user.

Perhaps, this PR would work if it was a SHOULD.

@vbatts
Copy link
Member

vbatts commented May 3, 2017

i think i'm inclined to say this should be a SHOULD as well. because as a general rule this is the case, though sure-enough there would be some feature that asks to override and include a default fileset in where the volume would be, so if the volume is not present then the app "does the right thing" or some non-sense, and the MUST would make it non-compliant.

@cyphar
Copy link
Member Author

cyphar commented May 22, 2017

Alright, I'm going to bump this one with SHOULD so we can get it into 1.0.0 (this is quite important IMO even though it's only informative -- because of how it plays with #492).

This is necessary in order to make sure that unpackers all have sane
behaviour when it comes to handling image repacking and layer
generation. Since data volumes are generally bindmounts from sources
external to the image, it is not a good idea to snapshot said data --
and thus we should forbid it.

Signed-off-by: Aleksa Sarai <[email protected]>
@cyphar
Copy link
Member Author

cyphar commented May 22, 2017

Updated, PTAL @stevvooe @vbatts. I also switched a will to a MUST but I can drop that if you want.

@stevvooe
Copy link
Contributor

@cyphar I still believe this language to be too strong. It is imposing requirements on build process that doesn't really exist. OCI doesn't really create any requirements for how one image is related to another.

The way this is written, it also opens up the possibility for a Volume to be a file, which hasn't really ever existed.

@cyphar
Copy link
Member Author

cyphar commented May 23, 2017

OCI doesn't really create any requirements for how one image is related to another.

True, but we need to encode some semantic meaning in Config.Volumes so that it actually does mask diffs. We could make it only affect extraction but that won't actually help solve the whole "volume containing a secret getting snapshotted" problem which this current PR solves. In particular I want to make it so that the whole idea that "docker commit doesn't cross volume boundaries" is actually enforced by implementations.

I won't lie, I actually missed this issue when implementing umoci (and I have yet to fix it because I want this language to be specified first).

it also opens up the possibility for a Volume to be a file, which hasn't really ever existed.

Personally I don't see that as a negative. 😉 But we could deny that if you really want. Ultimately "data volume" is so ill-specified in this spec I really don't know what an implementation should reasonably do in the absence of prior implementations to compare against.

@vbatts
Copy link
Member

vbatts commented May 25, 2017

@cyphar this language prohibits actual use-cases i'm aware of. Not for copying/commiting FROM a volume, but rather copying TO a volume
moby/moby#17470 (comment)

[vbatts@bananaboat] /tmp/tmp.GhZqUJ$ docker build -t f .
Sending build context to Docker daemon 3.584 kB
Step 1 : FROM fedora
 ---> 15895ef0b3b2
Step 2 : RUN mkdir /data && touch /data/README
 ---> Running in 3c2c12dd5864
 ---> 5b1376684e24
Removing intermediate container 3c2c12dd5864
Step 3 : VOLUME /data
 ---> Running in 4c5f67eeee33
 ---> 9a9ec38af080
Removing intermediate container 4c5f67eeee33
Successfully built 9a9ec38af080
[vbatts@bananaboat] /tmp/tmp.GhZqUJ$ docker run -v /data/ --rm -it f sh
sh-4.3# ls /data/
README

and this makes a new docker volume ls visible volume.

I realize this is confusing language to define in the spec, but your proposed language prohibits this existing functionality. :-
I'm inclined to close this and we can revisit for a future version.

@cyphar
Copy link
Member Author

cyphar commented May 26, 2017

this language prohibits actual use-cases i'm aware of

The old language does too. If we want to change it we should, but in any case we still need to fix up this wording anyway because the old text is problematic already.

@vbatts
Copy link
Member

vbatts commented May 26, 2017 via email

cyphar added a commit to opencontainers/umoci that referenced this pull request Jun 3, 2017
This is (unfortunately) not mandated in the specification[1,2], but in
order to avoid accidentally spilling private information into published
layers (which is one use of Volumes) we must ignore all layers included
in Config.Volumes.

In future we should also add some flags or alernative ways of masking
paths.

[1]: opencontainers/image-spec#496
[2]: opencontainers/image-spec#504

Signed-off-by: Aleksa Sarai <[email protected]>
cyphar added a commit to opencontainers/umoci that referenced this pull request Jun 3, 2017
This is (unfortunately) not mandated in the specification[1,2], but in
order to avoid accidentally spilling private information into published
layers (which is one use of Volumes) we must ignore all layers included
in Config.Volumes.

In future we should also add some flags or alernative ways of masking
paths.

[1]: opencontainers/image-spec#496
[2]: opencontainers/image-spec#504

Signed-off-by: Aleksa Sarai <[email protected]>
@cyphar
Copy link
Member Author

cyphar commented Jun 25, 2017

Closing in favour of #694.

@cyphar cyphar closed this Jun 25, 2017
@cyphar cyphar deleted the 496-handle-volumes-in-changesets branch June 25, 2017 12:05
Copy link

@ghost ghost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

    • [ ]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants