Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix and enhance invalid symlink handling when installing certificates to target container #1135

Merged
merged 1 commit into from
Nov 15, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -345,12 +345,80 @@ def _get_files_owned_by_rpms(context, dirpath, pkgs=None, recursive=False):
return files_owned_by_rpms


def _copy_decouple(srcdir, dstdir):
"""
Copy `srcdir` to `dstdir` while decoupling symlinks.

What we mean by decoupling the `srcdir` is that any symlinks pointing
outside the directory will be copied as regular files. This means that the
directory will become independent from its surroundings with respect to
symlinks. Any symlink (or symlink chains) within the directory will be
preserved.

"""

for root, dummy_dirs, files in os.walk(srcdir):
for filename in files:
relpath = os.path.relpath(root, srcdir)
source_filepath = os.path.join(root, filename)
target_filepath = os.path.join(dstdir, relpath, filename)

# Skip and report broken symlinks
if not os.path.exists(source_filepath):
api.current_logger().warning(
'File {} is a broken symlink! Will not copy the file.'.format(source_filepath))
continue

# Copy symlinks to the target userspace
source_is_symlink = os.path.islink(source_filepath)
pointee = None
if source_is_symlink:
pointee = os.readlink(source_filepath)

# If source file is a symlink within `srcdir` then preserve it,
# otherwise resolve and copy it as a file it points to
if pointee is not None and not pointee.startswith(srcdir):
# Follow the path until we hit a file or get back to /etc/pki
while not pointee.startswith(srcdir) and os.path.islink(pointee):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dkubek in this case it's not accorate as the srcdir is nowadays /etc/pki instead of /etc/pki/. From this point, if I have symlink to /etc/pki-custom, it will be missleading. From this piont, I suggest to update the call of the function to _copy_decouple('/etc/pki/', target_pki), which will resolve this very corner case. In some other actors, we have this check resolved, but as TBH, I would not go so far in this case to have Py2 & Py3 compatible code (py3 has a nice functions for that in os.path, but py2 doesn't).

it's very corner (I do not expect customers to create own /etc/pki..../ dir) so it's not blocking the merge nor it's required to be applied from my side, but keeping the space for the application if you want to. I want to merge the PR ideally tomorrow (Wed).

pointee = os.readlink(pointee)

# Pointee points to a _regular file_ outside /etc/pki so we
# copy it instead
if not pointee.startswith(srcdir) and not os.path.islink(pointee):
source_is_symlink = False
source_filepath = pointee
else:
# pointee points back to /etc/pki
pass

# Ensure parent directory exists
parent_dir = os.path.dirname(target_filepath)
# Note: This is secure because we know that parent_dir is located
# inside of `$target_userspace/etc/pki` which is a directory that
# is not writable by unprivileged users. If this function is used
# elsewhere we may need to be more careful before running `mkdir -p`.
run(['mkdir', '-p', parent_dir])
dkubek marked this conversation as resolved.
Show resolved Hide resolved

if source_is_symlink:
# Preserve the owner and permissions of the original symlink
run(['ln', '-s', pointee, target_filepath])
run(['chmod', '--reference={}'.format(source_filepath), target_filepath])
abadger marked this conversation as resolved.
Show resolved Hide resolved
continue

run(['cp', '-a', source_filepath, target_filepath])


def _copy_certificates(context, target_userspace):
"""
Copy the needed certificates into the container, but preserve original ones
Copy certificates from source system into the container, but preserve
original ones

Some certificates are already installed in the container and those are
default certificates for the target OS, so we preserve these.

We respect the symlink hierarchy of the source system within the /etc/pki
folder. Dangling symlinks will be ignored.

"""

target_pki = os.path.join(target_userspace, 'etc', 'pki')
Expand All @@ -360,36 +428,56 @@ def _copy_certificates(context, target_userspace):
files_owned_by_rpms = _get_files_owned_by_rpms(target_context, '/etc/pki', recursive=True)
api.current_logger().debug('Files owned by rpms: {}'.format(' '.join(files_owned_by_rpms)))

# Backup container /etc/pki
run(['mv', target_pki, backup_pki])
context.copytree_from('/etc/pki', target_pki)

# Copy source /etc/pki to the container
_copy_decouple('/etc/pki', target_pki)

# Assertion: after running _copy_decouple(), no broken symlinks exist in /etc/pki in the container
# So any broken symlinks created will be by the installed packages.

# Recover installed packages as they always get precedence
for filepath in files_owned_by_rpms:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the difference in what the following code is attempting to achieve and the code in _copy_decouple? It feels like the link detection and possible dereferencing and copying would be the same and the main difference is just that we're looping over different files? The implementations are a bit different so I'm not sure if I'm understanding it, but if the purpose is the same, we should look at making it a helper function that we call from both places.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I agree that the implementation is quite similar and I have considered to abstract it to to a common function, however there is a subtle difference in checking whether the symlink is broken. When checking a symlink /etc/pki/A -> /etc/pki/B on the source system, it is easy to dereference it to file /etc/pki/B, however consider a symlink installed in the container $target_container/etc/pki/A -> /etc/pki/B - every time we dereference a link we have to manually update the path to point to a file within the the context of the container. Another small difference is that we have made the decision to respect the files installed by RPMs. So whenever they point to a file outside /etc/pki we keep this symlink there, even though when copying certificates from the source system we would replace this link with the file. I am not sure whether this happens or not or makes a significant difference. With these small differences I felt justified in keeping the two procedures separate (even though very similar). However if you still feel differently I will figure something out.

src_path = os.path.join(backup_pki, filepath)
dst_path = os.path.join(target_pki, filepath)

# Resolve and skip any broken symlinks
is_broken_symlink = False
while os.path.islink(src_path):
# The symlink points to a path relative to the target userspace so
# we need to readjust it
next_path = os.path.join(target_userspace, os.readlink(src_path)[1:])
if not os.path.exists(next_path):
is_broken_symlink = True

# The path original path of the broken symlink in the container
report_path = os.path.join(target_pki, os.path.relpath(src_path, backup_pki))
api.current_logger().warning('File {} is a broken symlink!'.format(report_path))
break

src_path = next_path
pointee = None
if os.path.islink(src_path):
pointee = os.path.join(target_userspace, os.readlink(src_path)[1:])

seen = set()
while os.path.islink(pointee):
# The symlink points to a path relative to the target userspace so
# we need to readjust it
pointee = os.path.join(target_userspace, os.readlink(src_path)[1:])
if not os.path.exists(pointee) or pointee in seen:
is_broken_symlink = True

# The path original path of the broken symlink in the container
report_path = os.path.join(target_pki, os.path.relpath(src_path, backup_pki))
api.current_logger().warning(
'File {} is a broken symlink! Will not copy!'.format(report_path))
break

seen.add(pointee)

if is_broken_symlink:
continue

# Cleanup conflicting files
run(['rm', '-rf', dst_path])

# Ensure destination exists
parent_dir = os.path.dirname(dst_path)
run(['mkdir', '-p', parent_dir])
run(['cp', '-a', src_path, dst_path])

# Copy the new file
run(['cp', '-R', '--preserve=all', src_path, dst_path])

run(['rm', '-rf', backup_pki])


def _prep_repository_access(context, target_userspace):
Expand All @@ -401,6 +489,10 @@ def _prep_repository_access(context, target_userspace):
backup_yum_repos_d = os.path.join(target_etc, 'yum.repos.d.backup')

_copy_certificates(context, target_userspace)
# NOTE(dkubek): context.call(['update-ca-trust']) seems to not be working.
# I am not really sure why. The changes to files are not
# being written to disk.
Comment on lines +492 to +494
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably look into why, right? Is this something that I can replicate using a failing unittest and replacing the run(chroot....) with context.call([update-ca-trust...)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is not suitable for unit-tests (in the way that unit-tests should uncover this problem) as we want to omit unit-tests that are actually somehow messing with the system (it would require an installation of a container somewhere, ....). This is something what should be covered by integration tests.

One difference between the run and context.call is, that the run operation executes chroot, in which the command is performed. Whereas the context.call performs systemd-nspawn. Also env & some sys/run/... files could be different. I am curious what is the real reason behind that difference. We should definitely check that. @Jakuje any thought?

run(["chroot", target_userspace, "/bin/bash", "-c", "su - -c update-ca-trust"])

if not rhsm.skip_rhsm():
run(['rm', '-rf', os.path.join(target_etc, 'rhsm')])
Expand Down
Loading
Loading