Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

File attributes are not correctly hashed or preserved by ostree container encapsulate resulting in a failure to pull images with bootc and rpm-ostree #920

Closed
prydom opened this issue Nov 25, 2024 · 5 comments
Labels
area/client Related to the client/CLI bug Something isn't working

Comments

@prydom
Copy link

prydom commented Nov 25, 2024

Recently I've observed pulling down images created with rpm-ostree compose container-encapsulate results in errors similar to this when running bootc upgrade:

Importing: Unencapsulating base: Importing commit: Importing object 4c/2f5284359a960f16ae5f3cfb70534035ee0620c14234855975da1cda95ccc7.file: Processing content object 4c2f5284359a960f16ae5f3cfb70534035ee0620c14234855975da1cda95ccc7: Importing regfile small: Writing content object: Corrupted file object; checksum expected='4c2f5284359a960f16ae5f3cfb70534035ee0620c14234855975da1cda95ccc7' actual='35a978c60a57d76b632d293c84b731e5e6e312fad73520888152a84d798d7472'

I determined on my images the content objects causing issues were files where dnf5 wrote "user.librepo.checksum.mtime" and "user.librepo.checksum.sha256" extended attributes. These are written on RPMs installed from a (local to the container) repository.

It is possible this is a regression due to ostreedev/ostree-rs-ext#679 because it's the only recent ostree-rs-ext/bootc commit touching extended attributes. It is also possible this is a bug that's always existed but is uncovered by preserving extended attributes from the container image tar stream. Regardless, this only started happening during the past week or so after rpm-ostree 2024.9-1 got released to rawhide.

I have written a minimal reproduction using ostree container encapsulate and ostree container image pull.
Review and run this script with the working directory set to an empty directory or one you don't care about. It creates small ostree repos, commits, and container images in the current directory.

There are 3 changes to this script you can make to cause the test to pass with some degree of success (container pull succeeds and possibly ostree fsck as well; ideally both would succeed):

  1. Remove the --selinux-policy argument on the ostree commit command. This is not a viable workaround because I need to relabel at this stage of building an image.
  2. Remove all user.* extended attributes from the image. This is the workaround I went with for now, only user.librepo.* attributes are triggering this bug for me.
  3. Encapsulate to OCI an ostree commit stored in a bare-user repo instead of a bare repo. I haven't tested this with a real bootable image yet. This allows the ostree container image pull to proceed but ostree fsck -a in a bare repo we pull into later doesn't pass.

Reproduction script

Expand this to see reproduction script
#!/usr/bin/bash

# Echo executed commands and stop on the first error
set -xe

# This script should be running as root to use bare ostree repos. If you use bare-user then you can drop.
# Run in a VM or rootless container (as root in the mount namespace) if you don't trust the script.
if [ "$EUID" -ne 0 ]
    then echo "You may consider running as root for bare ostree repos"
fi

# Cleanup previous runs
# DO NOT RUN THIS SCRIPT IN A WORKING DIRECTORY YOU CARE ABOUT
rm -rf image repo commit checkout

# Create a new bare ostree repo as root
ostree init --repo=repo --mode=bare

# Create a new commit with just a single file with an extended attribute
mkdir commit
cd commit
echo "hello world" > test.txt
# This is what my system policy labels a file called test.txt on / as, comment out or change if this differs for you.
# I just have it here to play a game of spot the difference
setfattr -n security.selinux -v "system_u:object_r:etc_runtime_t:s0" test.txt
setfattr -n user.librepo.checksum.mtime -v 0000 test.txt
getfattr -d -m - -- *
stat test.txt
cd ..

# Commit that to the new ostree repo with ref xattrtest
# Note that in my real workflow I use an selinux policy from the image.
# I use the system one because this isn't a real bootable image.
# I usually use --consume but don't here because this is not from an existing ostree commit
ostree commit \
--branch=xattrtest \
--repo=repo \
"--selinux-policy=/" \
commit
rm -rf commit

# fsck the repo
# this is the first time the behavior is a bit unexpected as the selinux policy relabel causes a content hash mismatch
ostree fsck --repo=repo -a || true

# checkout to show that ostree is keeping the xattrs
ostree checkout --repo=repo \
--require-hardlinks \
"xattrtest" checkout

# Print the xattrs
cd checkout
getfattr -d -m - -- *
stat test.txt
cd ..
rm -rf checkout

# encapsulate to a OCI container
ostree container encapsulate \
--repo=repo \
--cmd="/usr/bin/bash" \
--label="containers.bootc=1" \
--label="ostree.bootable=true" \
xattrtest "dir:$PWD/image"

# reset the ostree repo
rm -rf repo
ostree init --repo=repo --mode=bare

# Pull the container - THIS MIGHT FAIL AND END THE SCRIPT due to `set -xe`
ostree container image pull repo "ostree-unverified-image:dir:$PWD/image"

# If the above doesn't fail, this may still fail
# fsck the repo to show that everything is preserved or not
ostree fsck --repo=repo -a || true

# Checkout the pulled container
REF="$(ostree refs --repo=repo ostree/container/image)"
ostree checkout --repo=repo \
--require-hardlinks \
"ostree/container/image/$REF" checkout

# print the xattrs
cd checkout
getfattr -d -m - -- *
stat test.txt
cd ..

# Clean up
rm -rf checkout image repo

Running this script on the rpm-ostree versions I detail below results in the following output.

Expand this to see reproduction output
+ '[' 0 -ne 0 ']'
+ rm -rf image repo commit checkout
+ ostree init --repo=repo --mode=bare
+ mkdir commit
+ cd commit
+ echo 'hello world'
+ setfattr -n security.selinux -v system_u:object_r:etc_runtime_t:s0 test.txt
+ setfattr -n user.librepo.checksum.mtime -v 0000 test.txt
+ getfattr -d -m - -- test.txt
# file: test.txt
security.selinux="system_u:object_r:etc_runtime_t:s0"
user.librepo.checksum.mtime="0000"

+ stat test.txt
  File: test.txt
  Size: 12              Blocks: 8          IO Block: 4096   regular file
Device: 0,46    Inode: 25915273    Links: 1
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Context: system_u:object_r:etc_runtime_t:s0
Access: 2024-11-25 13:59:21.973942016 -0800
Modify: 2024-11-25 13:59:21.973942016 -0800
Change: 2024-11-25 13:59:21.974942015 -0800
 Birth: 2024-11-25 13:59:21.973942016 -0800
+ cd ..
+ ostree commit --branch=xattrtest --repo=repo --selinux-policy=/ commit
6bfe549dca7a60f6dbfc44ac1a5568bebbcba3f437299c48407078fa0acb4b5f
+ rm -rf commit
+ ostree fsck --repo=repo -a
Validating refs...
Validating refs in collections...
Enumerating commits...
Verifying content integrity of 1 commit objects...
fsck objects (1/4) [===          ]  25%In commits 6bfe549dca7a60f6dbfc44ac1a5568bebbcba3f437299c48407078fa0acb4b5f: fsck content object 4c2f5284359a960f16ae5f3cfb70534035ee0620c14234855975da1cda95ccc7: Corrupted file object; checksum expected='4c2f5284359a960f16ae5f3cfb70534035ee0620c14234855975da1cda95ccc7' actual='35a978c60a57d76b632d293c84b731e5e6e312fad73520888152a84d798d7472'
fsck objects (4/4) [=============] 100%
error: Repository corruption encountered
+ true
+ ostree checkout --repo=repo --require-hardlinks xattrtest checkout
+ cd checkout
+ getfattr -d -m - -- test.txt
# file: test.txt
security.selinux="system_u:object_r:etc_runtime_t:s0"
user.librepo.checksum.mtime="0000"

+ stat test.txt
  File: test.txt
  Size: 12              Blocks: 8          IO Block: 4096   regular file
Device: 0,46    Inode: 25915280    Links: 2
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Context: system_u:object_r:etc_runtime_t:s0
Access: 2024-11-25 13:59:22.024941996 -0800
Modify: 1969-12-31 16:00:00.000000000 -0800
Change: 2024-11-25 13:59:22.030941993 -0800
 Birth: 2024-11-25 13:59:21.994942007 -0800
+ cd ..
+ rm -rf checkout
+ ostree container encapsulate --repo=repo --cmd=/usr/bin/bash --label=containers.bootc=1 --label=ostree.bootable=true xattrtest dir:/var/home/USER/ostree-tests/check-corrupt/image
sha256:b87004ace3ab73bb3c95f1fcfa5fe3ba6cdb31c2bd945cf88ef42355bd8df55f
+ rm -rf repo
+ ostree init --repo=repo --mode=bare
+ ostree container image pull repo ostree-unverified-image:dir:/var/home/USER/ostree-tests/check-corrupt/image
layers already present: 0; layers needed: 1 (3.3 kB)
error: Importing: Unencapsulating base: Importing commit: Importing object 4c/2f5284359a960f16ae5f3cfb70534035ee0620c14234855975da1cda95ccc7.file: Processing content object 4c2f5284359a960f16ae5f3cfb70534035ee0620c14234855975da1cda95ccc7: Importing regfile small: Writing content object: Corrupted file object; checksum expected='4c2f5284359a960f16ae5f3cfb70534035ee0620c14234855975da1cda95ccc7' actual='35a978c60a57d76b632d293c84b731e5e6e312fad73520888152a84d798d7472'

Here's what the extended attributes look like in the container image:

Expand this to see interrogating the container
$ podman pull dir:$PWD/image
Getting image source signatures
Copying blob 02641bd1db4f done   | 
Copying config 75b22a9daa done   | 
Writing manifest to image destination
75b22a9daa1864a0f37b577d82c11e0acd61572e3f9f5d9ddb816120c7689d99

$ podman create --name ostree-check 75b22a9daa1864a0f37b577d82c11e0acd61572e3f9f5d9ddb816120c7689d99
34300a50f27a5e43627ede137f9194c298894dfd8e9d36ae7c387d9cc691ad3c

$ cd `podman mount ostree-check`

$ getfattr -d -m - -- test.txt
# file: test.txt
security.selinux="system_u:object_r:container_file_t:s0:c195,c980" 

$ stat test.txt 
  File: test.txt
  Size: 12              Blocks: 8          IO Block: 4096   regular file
Device: 0,105   Inode: 25916335    Links: 2
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Context: system_u:object_r:container_file_t:s0:c195,c980
Access: 1969-12-31 16:00:00.000000000 -0800
Modify: 1969-12-31 16:00:00.000000000 -0800
Change: 2024-11-25 14:24:20.525324527 -0800
 Birth: 2024-11-25 14:24:20.525324527 -0800


$ strings sysroot/ostree/repo/objects/4c/2f5284359a960f16ae5f3cfb70534035ee0620c14234855975da1cda95ccc7.file-xattrs-link
security.selinux
system_u:object_r:etc_runtime_t:s0
user.librepo.checksum.mtime
0000

Software versions and environment

The underlying filesystem is btrfs. I use the Fedora 41 kernel, not the rawhide one.

$ cat /etc/os-release | head -n2
NAME="Fedora Linux"
VERSION="Rawhide.20241124.n.0 (Kinoite Prerelease)"

$ uname -a
Linux hostname 6.11.8-300.fc41.x86_64 #1 SMP PREEMPT_DYNAMIC Thu Nov 14 20:37:39 UTC 2024 x86_64 GNU/Linux

$ bootc --version
bootc 1.1.2

$ rpm-ostree --version
rpm-ostree:
 Version: '2024.9'
 Git: bd152998332c2b11af671447d3fd311319d5b3a8
 Features:
  - rust
  - compose
  - container
  - fedora-integration

$ ostree --version
libostree:
 Version: '2024.9'
 Git: 14e29775963d44bc533715efcf2362d6051756de
 Features:
  - inode64
  - initial-var
  - libcurl
  - libsoup3
  - gpgme
  - composefs
  - ex-fsverity
  - libarchive
  - selinux
  - openssl
  - sign-ed25519
  - libmount
  - systemd
  - release
  - p2p

@cgwalters cgwalters added bug Something isn't working area/client Related to the client/CLI labels Nov 25, 2024
@cgwalters
Copy link
Collaborator

Hmmmm.... ostreedev/ostree-rs-ext#679 should only be affecting "layered" (i.e. non-ostree) paths. To be clear this container image is a "pure ostree" one, you haven't created any additional layers?

Remove all user.* extended attributes from the image. This is the workaround I went with for now, only user.librepo.* attributes are triggering this bug for me.

That seems sane as a workaround - there's no real reason to ship those xattrs anyways, right?

But yes...it does look like there's some sort of bug here.

@prydom
Copy link
Author

prydom commented Nov 25, 2024

This reproduces with the same result on rpm-ostree layered images as I'm running into this on my own image builds. This narrows the scope for the bug a bit since rpm-ostree calls the "pure ostree" path: https://github.com/coreos/rpm-ostree/blob/main/rust/src/container.rs#L519C36-L519C47. My workflow is to create a "pure ostree" image (layered/chunked with rpm-ostree) and then layer further software on top with buildah. Both the base image and the later layered images fail to pull.

It is just much harder to write a small reproduction script to share here using a real bootable image because I need a RPM database or to download a bootstrap image.

@prydom
Copy link
Author

prydom commented Nov 25, 2024

Also, on later images that build on the workaround base image that can be pulled using dnf5 - where the "user.librepo.*" xattrs are in the container tar stream itself - I get failures on ostree fsck -a for the RPM files and repodata/ XML files on the deployed machines.

The machines boot fine and the file contents are OK, so that doesn't matter as much but might cause users to become concerned if they layer xattrs during a container build.

EDIT: This is a different bug when I layer xattrs on a working base image. I can separately file that one if you want @cgwalters.

@prydom
Copy link
Author

prydom commented Nov 26, 2024

I think I can answer why this happens for the "pure ostree" case, @cgwalters. It seems the root cause has been in https://github.com/ostreedev/ostree for a while and it may just be some coincidental changes in libdnf5, librpm or librepo which caused this to start failing container pulls now.

libglnx will always sort the extended attributes returned from glnx_dfd_name_get_all_xattrs with canonicalize_xattrs(). This ensures we should always get the same content hash as long as the same file contents + uid/gid/perms/xattrs match even if the underlying filesystem returns a different order.

However, during commit the way get_final_xattrs() works is to first delete the security.selinux attribute with _ostree_filter_selinux_xattr() and then append the new label to the end of the gvariant array.

Linux capabilities have been working because "security.capabilities" always comes before "security.selinux". However the existence of any "user.*" attributes means that list will no longer be sorted and therefore the hash calculated at commit time will not match what is stored in the bare repo and what is copied into the encapsulated commit container image.

So, preserving system.* (e.g. ACLs) or trusted.* extended attributes would be bit by this same bug. I have reproduced the same failure to pull the container by adding an ACL.

# Create a new commit with just a single file with an extended attribute
mkdir commit
cd commit
echo "hello world" > test.txt
setfattr -n security.selinux -v "system_u:object_r:etc_runtime_t:s0" test.txt
setfacl -m "u:root:rw-" test.txt
setcap 'cap_net_bind_service=ep' test.txt
getfattr -d -m - -- *
stat test.txt
cd ..

@prydom
Copy link
Author

prydom commented Dec 2, 2024

I have completed some end-to-end testing to confirm the behavior of bootc that I was encountering are root caused in and fully resolved by ostreedev/ostree#3346. Extended attributes of all types stored in bare, bare-user, and bare-split-xattrs ostree repositories and those in derived layers' tar streams will be properly handled by both OSTree and Composefs when the commit from the PR is merged. There were no backwards compatibility concerns encountered during testing.

See ostreedev/ostree#3346 (comment) for details of the testing. I'm closing this issue now since there is nothing that needs to be changed in bootc or ostree-rs-ext.

The issue can be tracked in ostreedev/ostree#3343 until the PR is merged.

@prydom prydom closed this as completed Dec 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/client Related to the client/CLI bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants