-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for creating a composefs from a directory #36
Changes from all commits
b480adc
3893594
b556397
3ff7435
b7c5c7a
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,53 @@ | ||
# How to create a composefs from an OCI image | ||
|
||
This document is incomplete. It only serves to document some decisions we've | ||
taken about how to resolve ambiguous situations. | ||
|
||
# Data precision | ||
|
||
We currently create a composefs image using the granularity of data as | ||
typically appears in OCI tarballs: | ||
- atime and ctime are not present (these are actually not physically present | ||
in the erofs inode structure at all, either the compact or extended forms) | ||
- mtime is set to the mtime in seconds; the sub-seconds value is simply | ||
truncated (ie: we always round down). erofs has an nsec field, but it's not | ||
normally present in OCI tarballs. That's down to the fact that the usual | ||
tar header only has timestamps in seconds and extended headers are not | ||
usually added for this purpose. | ||
- we take great care to faithfully represent hardlinks: even though the | ||
produced filesystem is read-only and we have data de-duplication via the | ||
objects store, we make sure that hardlinks result in an actual shared inode | ||
as visible via the `st_ino` and `st_nlink` fields on the mounted filesystem. | ||
|
||
We apply these precision restrictions also when creating images by scanning the | ||
filesystem. For example: even if we get more-accurate timestamp information, | ||
we'll truncate it to the nearest second. | ||
|
||
# Merging directories | ||
|
||
This is done according to the OCI spec, with an additional clarification: in | ||
case a directory entry is present in multiple layers, we use the tar metadata | ||
from the most-derived layer to determine the attributes (owner, permissions, | ||
mtime) for the directory. | ||
|
||
# The root inode | ||
|
||
The root inode (/) is a difficult case because it doesn't always appear in the | ||
layer tarballs. We need to make some arbitrary decisions about the metadata. | ||
|
||
Here's what we do: | ||
|
||
- if any layer tarball contains an empty for '/' then we'd like to use it. | ||
The code for this doesn't exist yet, but it seems reasonable as a principle. | ||
In case the `/` entry were to appear in multiple layers, we'd use the | ||
most-derived layer in which it is present (as per the logic in the previous | ||
section). | ||
- otherwise: | ||
- we assume that the root directory is owned by root:root and has `a+rx` | ||
permissions (ie: `0555`). This matches the behaviour of podman. Note in | ||
particular: podman uses `0555`, not `0755`: the root directory is not | ||
(nominally) writable by the root user. | ||
- the mtime of the root directory is taken to be equal to the most recent | ||
file in the entire system, that is: the highest numerical value of any | ||
mtime on any inode. The rationale is that this is usually a very good | ||
proxy for "when was the (most-derived) container image created". |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
/cfsctl | ||
/extra/usr/lib/dracut/modules.d/37composefs/composefs-pivot-sysroot | ||
/fix-verity.efi | ||
/image.qcow2 | ||
/tmp/ |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,48 @@ | ||
# Need 6.12 kernel from rawhide | ||
FROM fedora:rawhide AS base | ||
COPY extra / | ||
COPY cfsctl /usr/bin | ||
RUN --mount=type=cache,target=/var/cache/libdnf5 <<EOF | ||
set -eux | ||
|
||
# we should install kernel-modules here, but can't | ||
# because it'll pull in the entire kernel with it | ||
# it seems to work fine for now.... | ||
dnf --setopt keepcache=1 install -y \ | ||
composefs \ | ||
dosfstools \ | ||
policycoreutils-python-utils \ | ||
selinux-policy-targeted \ | ||
skopeo \ | ||
strace \ | ||
systemd \ | ||
util-linux | ||
systemctl enable systemd-networkd | ||
semanage permissive -a systemd_gpt_generator_t # for volatile-root workaround | ||
passwd -d root | ||
mkdir /sysroot | ||
EOF | ||
|
||
FROM base AS kernel | ||
RUN --mount=type=bind,from=base,target=/mnt/base <<EOF | ||
set -eux | ||
|
||
mkdir -p /tmp/sysroot/composefs | ||
COMPOSEFS_FSVERITY="$(cfsctl --repo /tmp/sysroot create-image /mnt/base)" | ||
|
||
mkdir -p /etc/kernel /etc/dracut.conf.d | ||
echo "composefs=${COMPOSEFS_FSVERITY} rw" > /etc/kernel/cmdline | ||
EOF | ||
RUN --mount=type=cache,target=/var/cache/libdnf5 <<EOF | ||
# systemd-boot-unsigned: ditto | ||
# btrfs-progs: dracut wants to include this in the initramfs | ||
# ukify: dracut doesn't want to take our cmdline args? | ||
dnf --setopt keepcache=1 install -y kernel btrfs-progs systemd-boot-unsigned systemd-ukify | ||
EOF | ||
|
||
# This could (better?) be done from cfsctl... | ||
FROM base AS bootable | ||
COPY --from=kernel /boot /composefs-meta/boot | ||
# RUN rm -rf /composefs-meta | ||
# RUN commands touch /run unfortunately | ||
COPY empty /.wh.composefs-meta |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,35 @@ | ||
#!/bin/sh | ||
|
||
set -eux | ||
|
||
cd "${0%/*}" | ||
|
||
cargo build --release | ||
|
||
cp ../../target/release/cfsctl . | ||
cp ../../target/release/composefs-pivot-sysroot extra/usr/lib/dracut/modules.d/37composefs/ | ||
CFSCTL='./cfsctl --repo tmp/sysroot/composefs' | ||
|
||
rm -rf tmp | ||
mkdir -p tmp/sysroot/composefs tmp/sysroot/var | ||
|
||
# mkdir tmp/internal-sysroot # for debugging | ||
# podman build -v $(pwd)/tmp/internal-sysroot:/tmp/sysroot:z,U --iidfile=tmp/iid "$@" . | ||
# | ||
podman build --iidfile=tmp/iid "$@" . | ||
|
||
IMAGE_ID="$(sed s/sha256:// tmp/iid)" | ||
podman save --format oci-archive -o tmp/final.tar "${IMAGE_ID}" | ||
${CFSCTL} oci pull oci-archive:tmp/final.tar | ||
IMAGE_FSVERITY="$(${CFSCTL} oci create-image "${IMAGE_ID}")" | ||
|
||
mkdir -p tmp/efi/loader | ||
echo 'timeout 3' > tmp/efi/loader/loader.conf | ||
mkdir -p tmp/efi/EFI/BOOT tmp/efi/EFI/systemd | ||
cp /usr/lib/systemd/boot/efi/systemd-bootx64.efi tmp/efi/EFI/systemd | ||
cp /usr/lib/systemd/boot/efi/systemd-bootx64.efi tmp/efi/EFI/BOOT/BOOTX64.EFI | ||
${CFSCTL} oci prepare-boot "${IMAGE_ID}" tmp/efi | ||
|
||
fakeroot ./make-image | ||
qemu-img convert -f raw tmp/image.raw -O qcow2 image.qcow2 | ||
./fix-verity image.qcow2 # https://github.com/tytso/e2fsprogs/issues/201 |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
../run/systemd/resolve/stub-resolv.conf |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
# we want to make sure the virtio disk drivers get included | ||
hostonly=no | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This type of stuff is also in the fedora-bootc base image. |
||
|
||
# we need to force these in via the initramfs because we don't have modules in | ||
# the base image | ||
force_drivers+=" virtio_net vfat " |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,34 @@ | ||
# Copyright (C) 2013 Colin Walters <[email protected]> | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this can be dropped, I am not sure I'd consider it "derived enough" honestly. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ...and it's already been copied from the other two copies of it already kicking around in the |
||
# | ||
# This library is free software; you can redistribute it and/or | ||
# modify it under the terms of the GNU Lesser General Public | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Related to above probably for consistency we should use the overall repo license. But I guess this is all a demo that may end up in a separate repo anyways. |
||
# License as published by the Free Software Foundation; either | ||
# version 2 of the License, or (at your option) any later version. | ||
# | ||
# This library is distributed in the hope that it will be useful, | ||
# but WITHOUT ANY WARRANTY; without even the implied warranty of | ||
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU | ||
# Lesser General Public License for more details. | ||
# | ||
# You should have received a copy of the GNU Lesser General Public | ||
# License along with this library. If not, see <https://www.gnu.org/licenses/>. | ||
|
||
[Unit] | ||
DefaultDependencies=no | ||
ConditionKernelCommandLine=composefs | ||
ConditionPathExists=/etc/initrd-release | ||
After=sysroot.mount | ||
Requires=sysroot.mount | ||
Before=initrd-root-fs.target | ||
Before=initrd-switch-root.target | ||
|
||
OnFailure=emergency.target | ||
OnFailureJobMode=isolate | ||
|
||
[Service] | ||
Type=oneshot | ||
ExecStart=/usr/bin/composefs-pivot-sysroot | ||
StandardInput=null | ||
StandardOutput=journal | ||
StandardError=journal+console | ||
RemainAfterExit=yes |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,20 @@ | ||
#!/usr/bin/bash | ||
|
||
check() { | ||
return 0 | ||
} | ||
|
||
depends() { | ||
return 0 | ||
} | ||
|
||
install() { | ||
inst \ | ||
"${moddir}/composefs-pivot-sysroot" /bin/composefs-pivot-sysroot | ||
inst \ | ||
"${moddir}/composefs-pivot-sysroot.service" \ | ||
"${systemdsystemunitdir}/composefs-pivot-sysroot.service" | ||
|
||
$SYSTEMCTL -q --root "${initdir}" add-wants \ | ||
'initrd-root-fs.target' 'composefs-pivot-sysroot.service' | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
layout = uki | ||
uki_generator = ukify |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
[Match] | ||
Type=ether | ||
|
||
[Link] | ||
RequiredForOnline=routable | ||
|
||
[Network] | ||
DHCP=yes | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
# Make sure we grow the right root filesystem | ||
|
||
[Service] | ||
ExecStart= | ||
ExecStart=/usr/lib/systemd/systemd-growfs /sysroot | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,59 @@ | ||
#!/bin/sh | ||
|
||
# workaround for https://github.com/tytso/e2fsprogs/issues/201 | ||
|
||
set -eux | ||
|
||
# We use a custom UKI with an initramfs containing a script that remounts | ||
# /sysroot read-write and enables fs-verity on all of the objects in | ||
# /composefs/objects. | ||
# | ||
# The first time we're run (or if we are modified) we (re-)generate the UKI. | ||
# This is done inside of a container (for independence from the host OS). | ||
|
||
image_file="$1" | ||
|
||
if [ "$0" -nt fix-verity.efi ]; then | ||
podman run --rm -i fedora > tmp/fix-verity.efi <<'EOF' | ||
set -eux | ||
|
||
cat > /tmp/fix-verity.sh <<'EOS' | ||
mount -o remount,rw /sysroot | ||
( | ||
cd /sysroot/composefs/objects | ||
echo >&2 'Enabling fsverity on composefs objects' | ||
for i in */*; do | ||
fsverity enable $i; | ||
done | ||
echo >&2 'done!' | ||
) | ||
umount /sysroot | ||
sync | ||
poweroff -ff | ||
EOS | ||
|
||
( | ||
dnf --setopt keepcache=1 install -y \ | ||
kernel binutils systemd-boot-unsigned btrfs-progs fsverity-utils | ||
dracut \ | ||
--uefi \ | ||
--no-hostonly \ | ||
--install 'sync fsverity' \ | ||
--include /tmp/fix-verity.sh /lib/dracut/hooks/pre-pivot/fix-verity.sh \ | ||
--kver "$(rpm -q kernel-core --qf '%{VERSION}-%{RELEASE}.%{ARCH}')" \ | ||
--kernel-cmdline="root=PARTLABEL=root-x86-64 console=ttyS0" \ | ||
/tmp/fix-verity.efi | ||
) >&2 | ||
|
||
cat /tmp/fix-verity.efi | ||
EOF | ||
mv tmp/fix-verity.efi fix-verity.efi | ||
fi | ||
|
||
qemu-system-x86_64 \ | ||
-nographic \ | ||
-m 4096 \ | ||
-enable-kvm \ | ||
-bios /usr/share/edk2/ovmf/OVMF_CODE.fd \ | ||
-drive file="$1",if=virtio,media=disk \ | ||
-kernel fix-verity.efi |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,19 @@ | ||
#!/bin/sh | ||
|
||
set -eux | ||
|
||
chown -R 0:0 tmp/sysroot | ||
chcon -R system_u:object_r:usr_t:s0 tmp/sysroot/composefs | ||
chcon system_u:object_r:var_t:s0 tmp/sysroot/var | ||
|
||
> tmp/image.raw | ||
SYSTEMD_REPART_MKFS_OPTIONS_EXT4='-O verity' \ | ||
systemd-repart \ | ||
--empty=require \ | ||
--size=auto \ | ||
--dry-run=no \ | ||
--no-pager \ | ||
--offline=yes \ | ||
--root=tmp \ | ||
--definitions=repart.d \ | ||
tmp/image.raw |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
[Partition] | ||
Type=esp | ||
Format=vfat | ||
CopyFiles=/efi:/ | ||
SizeMinBytes=512M | ||
SizeMaxBytes=512M |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
[Partition] | ||
Type=root | ||
Format=ext4 | ||
SizeMinBytes=10G | ||
SizeMaxBytes=10G | ||
CopyFiles=/sysroot:/ |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
#!/bin/sh | ||
|
||
set -eux | ||
|
||
cd "${0%/*}" | ||
|
||
qemu-system-x86_64 \ | ||
-m 4096 \ | ||
-enable-kvm \ | ||
-bios /usr/share/edk2/ovmf/OVMF_CODE.fd \ | ||
-drive file=image.qcow2,if=virtio,cache=unsafe \ | ||
-nic user,model=virtio-net-pci |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a really useful document!