Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rpm-ostree does not read /etc/dnf/vars #3241

Closed
pypingou opened this issue Nov 25, 2021 · 24 comments
Closed

rpm-ostree does not read /etc/dnf/vars #3241

pypingou opened this issue Nov 25, 2021 · 24 comments
Labels
jira for syncing to jira

Comments

@pypingou
Copy link

Host system details

Using osbuild we've built an ostree system based on CS9.

Expected vs actual behavior

# rpm-ostree install vim
Checking out tree 44e00bf... done
Enabled rpm-md repositories: baseos appstream
Updating metadata for 'baseos'... done
error: Updating rpm-md repo 'baseos': cannot update repo 'baseos': Cannot prepare internal mirrorlist: Status code: 404 for https://mirrors.centos.org/metalink?repo=centos-baseos-$stream&arch=aarch64&protocol=https,http (IP: 2001:4178:2:1269::fed2); Last error: Status code: 404 for https://mirrors.centos.org/metalink?repo=centos-baseos-$stream&arch=aarch64&protocol=https,http (IP: 2001:4178:2:1269::fed2)

Expected:

# rpm-ostree install foo
...
Success!

Steps to reproduce it

Produce a CS9-based ostree image/system.

The issue is the way the URL to the metalink is generated.
Currently, it hits:

# curl "https://mirrors.centos.org/metalink?repo=centos-baseos-9&arch=aarch64&protocol=https,http"
<?xml version="1.0" encoding="utf-8"?>
<metalink version="3.0" xmlns="http://www.metalinker.org/" type="dynamic" pubdate="Thu, 25 Nov 2021 08:34:56 GMT" generator="mirrormanager" xmlns:mm0="http://fedorahosted.org/mirrormanager">
<!-- # repo = centos-baseos-9 arch = aarch64 error: invalid repo or arch
# following repositories are available:
# repo=atomic-21&arch=x86_64
....
# repo=updates-testing-source-fc6&arch=source

-->
</metalink>

but we need to it instead:

# curl "https://mirrors.centos.org/metalink?repo=centos-baseos-9-stream&arch=aarch64&protocol=https,http"
<?xml version="1.0" encoding="utf-8"?>
<metalink version="3.0" xmlns="http://www.metalinker.org/" type="dynamic" pubdate="Thu, 25 Nov 2021 08:35:55 GMT" generator="mirrormanager" xmlns:mm0="http://fedorahosted.org/mirrormanager">
 <files>
  <file name="repomd.xml">
   <mm0:timestamp>1637259535</mm0:timestamp>
   <size>3988</size>
....
 </files>
</metalink>

The URL is defined as:

# cat /etc/yum.repos.d/centos.repo 
[baseos]
name=CentOS Stream $releasever - BaseOS
metalink=https://mirrors.centos.org/metalink?repo=centos-baseos-$stream&arch=$basearch&protocol=https,http

$stream should be replaced by 9-stream but is only replaced by 9.

$stream is defined in:

# cat /etc/dnf/vars/stream .
9-stream

It looks like rpm-ostree is not getting that value/information.

Is this analysis correct?

@lucab
Copy link
Contributor

lucab commented Nov 25, 2021

Looking at the URL in the rpm-ostree error:

https://mirrors.centos.org/metalink?repo=centos-baseos-$stream&arch=aarch64&protocol=https,http

It looks like $stream is not replaced at all. So I guess it is libdnf (as used by rpm-ostree) not picking up that variable.

@lucab
Copy link
Contributor

lucab commented Nov 25, 2021

I found rpm-software-management/libdnf#801 which seems related but should have been fixed long ago.
My understanding is that the expectation is that libdnf should transparently pick up those variables (without consumers involvement), but I may be wrong.

@pypingou
Copy link
Author

not that running dnf on the same system seems to pull fine the metadata, it fails at the download stage but that's expected

@jlebon
Copy link
Member

jlebon commented Nov 25, 2021

# cat /etc/dnf/vars/stream .
9-stream

To confirm, is this part of the base OSTree compose or did you add it? In case of the latter, try to bake it into the OSTree compose instead. At rpm-ostree install time, libdnf is pointed at a checkout of the OSTree commit before any /etc merge happens, so it won't see that file. Edit: of course, we can and should support reading these vars from the active deployment, though it'd require some libdnf patches.

@pypingou
Copy link
Author

I did not add this file, I've discovered it when looking around once I encountered the issue :)

@cgwalters cgwalters changed the title rpm-ostree errors out on CS9 rpm-ostree does not read /etc/dnf/vars Dec 1, 2021
@cgwalters
Copy link
Member

Ah, TIL there's now quay.io/centos/stream9 instead of just stream9-development, and these have different repo configs.

$ rpm -qf /etc/dnf/vars/stream 
centos-stream-repos-9.0-4.el9.noarch

OK I was able to quickly reproduce this on a FCOS test system by editing /etc/yum.repos.d/fedora.repo and s/fedora/$myos i.e. metalink=https://mirrors.fedoraproject.org/metalink?repo=$myos-$releasever&arch=$basearch to force a required variable resolution. But echo fedora > /etc/dnf/vars/myos didn't help.

@cgwalters
Copy link
Member

(Also TIL, C9S does not have a minimal ISO, only a 7.5GB one that I'm downloading now at 500k/s from some random mirror... 😢 )

@Conan-Kudo
Copy link

@BeeGrech
Copy link

Any updates to this issue?

@lucab lucab added the jira for syncing to jira label Apr 29, 2022
@lucab
Copy link
Contributor

lucab commented Apr 29, 2022

While looking into this, I was surprised to notice that setting variables through DNF_VAR_* environment flags does work:

# head /etc/yum.repos.d/centos.repo
[baseos]
name=CentOS Stream $releasever - BaseOS
metalink=https://mirrors.centos.org/metalink?repo=centos-baseos-$stream&arch=$basearch&protocol=https,http

# cat /etc/systemd/system/rpm-ostreed.service.d/env.conf
[Service]
Environment=DNF_VAR_stream=9-stream

# rpm-ostree refresh-md 
[...]
Updating metadata for 'baseos'... done
Importing rpm-md... done
rpm-md repo 'baseos'; generated: 2022-04-25T20:10:17Z solvables: 3218

# rpm-ostree --version | grep Version
 Version: '2022.8'

@lucab
Copy link
Contributor

lucab commented Apr 29, 2022

This seems to be a weird side-effect of libdnf install_root, which results in lookups at a surprising /usr/share/empty/etc/dnf/vars/ place:

# rpm-ostree usroverlay 
# mkdir -p /usr/share/empty/etc/dnf/vars
# echo '9-stream' > /usr/share/empty/etc/dnf/vars/stream
# systemctl stop rpm-ostreed
# rpm-ostree refresh-md 
[...]
Updating metadata for 'baseos'... done
rpm-md repo 'baseos'; generated: 2022-04-25T20:10:17Z solvables: 3218

This happens because rpm-ostree does:

* If either @install_root or @source_root are `NULL`, `/usr/share/empty` is
* used. Typically @source_root being `NULL` is for "from scratch" root

And internally libdnf does:

void
dnf_context_load_vars(DnfContext * context)
{
    auto priv = GET_PRIVATE(context);
    priv->vars->clear();
    for (auto dir = dnf_context_get_vars_dir(context); *dir; ++dir)
        ConfigMain::addVarsFromDir(*priv->vars, std::string(priv->install_root) + *dir);
    ConfigMain::addVarsFromEnv(*priv->vars);
    priv->varsCached = true;
}

@cgwalters
Copy link
Member

Hmm, but we should AFAICS always be passing the target deployment directory as source_root on the client side, so it shouldn't be NULL.

Main path should be prep_local_assembly() which does

  g_autofree char *tmprootfs_abspath = glnx_fdrel_abspath (self->tmprootfs_dfd, ".");
...
  if (!rpmostree_context_setup (self->ctx, tmprootfs_abspath, tmprootfs_abspath, cancellable,
                                error))
    return FALSE;

Can you add some debug prints in your test setup?

@lucab
Copy link
Contributor

lucab commented Apr 29, 2022

For refresh-md we are setting up a NULL install_root here:

/* We could bypass rpmostree_context_setup() here and call dnf_context_setup() ourselves
* since we're not actually going to perform any installation. Though it does provide us
* with the right semantics for install/source_root. */
if (!rpmostree_context_setup (ctx, NULL, origin_deployment_root, cancellable, error))
return FALSE;

For install it could be that it follows a different flow and ends up with a different arrangement. I assumed the two were going down the same logic, but I think I was wrong. I still need to trace that down.

@lucab
Copy link
Contributor

lucab commented Apr 29, 2022

For install it seems to be a different bug. We do get a non-NULL install_root, but libdnf can't find the directory there:

openat(AT_FDCWD, "/proc/self/fd/32/./etc/dnf/vars", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = -1 ENOENT (No such file or directory)

I don't know how much of tmprootfs_dfd is populated at that point, but I fear it is missing parts of /etc.

I tried tweaking libdnf to look for /usr/etc/dnf/vars instead and it is able to open it:

openat(AT_FDCWD, "/proc/self/fd/32/./usr/etc/dnf/vars", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 36

@cgwalters
Copy link
Member

Ahhh. Hmm. So the backstory here is originally I was trying to keep the rpm-ostree changes "hermetic", i.e. we always construct the new root without pulling forward state from the current root.

Very specifically, the new root should not use the current rpmdb for example. That would break in a rebase scenario. It would also cause "state leakage" in a live apply.

But, obviously for things like /etc/yum.repos.d, we need that. And that's implemented via
dnf_context_set_repo_dir (self->dnfctx, "/etc/yum.repos.d"); - note this pulls from the booted deployment's /etc.

So...I think what we want is dnf_context_set_vars_root (self->dnfctx, "/") or so to override.

@lucab
Copy link
Contributor

lucab commented May 2, 2022

And that's implemented via
dnf_context_set_repo_dir (self->dnfctx, "/etc/yum.repos.d");
- note this pulls from the booted deployment's /etc.

An interesting detail is that libdnf already has an API for setting vars directories, which is very similar to the one for setting repos directories:

/**
 * dnf_context_set_repos_dir:
 * Sets the repositories directories.
 **/
void
dnf_context_set_repos_dir(DnfContext *context, const gchar * const *repos_dir)

[...]

/**
 * dnf_context_set_vars_dir:
 * Sets the repo variables directory.
 **/
void
dnf_context_set_vars_dir(DnfContext *context, const gchar * const *vars_dir)

It is unclear to me why they behave in different ways with respect to install_root prefixing.
I'm not sure the resulting API behavior is actually fully intended.

So...I think what we want is dnf_context_set_vars_root (self->dnfctx, "/") or so to override.

I'll move this discussion upstream to see whether they prefer tweaking the current behavior (to avoid install_root prefixing) or adding something like dnf_context_set_vars_root() on top of it.

Aside from that, I think that for the moment we can hack around this bug with a dnf_context_get_vars_dir() / dnf_context_set_vars_dir() roundtrip.
It should be enough to prefix each path with an appropriate amount of ../ so that they get rooted back to the host / at runtime.

@cgwalters
Copy link
Member

Aside from that, I think that for the moment we can hack around this bug with a dnf_context_get_vars_dir() / dnf_context_set_vars_dir() roundtrip.

Are you taking this?

@lucab
Copy link
Contributor

lucab commented May 2, 2022

libdnf ticket is at rpm-software-management/libdnf#1503.

Yes, I'll venture into assembling the hacked workaround on our side tomorrow.
Depending on how the upstream thread goes, it may possibly be even quicker to bump to a patched library.

@lucab
Copy link
Contributor

lucab commented May 3, 2022

#3652 is the workaround for this, at least partially. It seems to fix rpm-ostree refresh-md.

But rpm-ostree install is still having path lookup issues:

# file /etc/dnf/vars
/etc/dnf/vars: directory

# strace <rpm-ostreed.service MainPID>
openat(AT_FDCWD, "/", O_RDONLY|O_NOCTTY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 20
[...]
openat(20, "ostree/repo", O_RDONLY|O_NOCTTY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 21
[...]
openat(21, "extensions/rpmostree/private/commit", O_RDONLY|O_NOCTTY|O_NONBLOCK|O_NOFOLLOW|O_CLOEXEC|O_DIRECTORY) = 32
[...]
openat(AT_FDCWD, "/proc/self/fd/32/./../../../../../etc/dnf/vars", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = -1 ENOENT (No such file or directory)

I do fear that this is due to the procfs-relative path resolution, which teleports the lookup somewhere else in the hierarchy.
Maybe rubbing a bit of realpath(install_root) may help in this case?
Without an actual fix in libdnf, I'm starting to run out of workaround ideas here...

@cgwalters
Copy link
Member

Nothing stops us from shipping a patched libdnf short term, we'd just need to fork the repo to e.g. https://github.com/coreos/libdnf or so.

@Conan-Kudo
Copy link

I think I had to do my own workaround for vars in PackageKit for a different reason. Maybe that could help here?

This is what I did in PackageKit: PackageKit/PackageKit@ed73aa6

@cgwalters
Copy link
Member

Thanks for the reference, offhand it does seem like changing the semantics of this in libdnf might break other things like PackageKit and we'd need to keep that in mind.

(Tangentially related to all this, what I think we want is to avoid string paths entirely and have dnf_context_set_vars_dir_fd() that takes a directory file descriptor, then it's unambiguous)

@lucab
Copy link
Contributor

lucab commented May 3, 2022

@Conan-Kudo thanks for the hint!
Though the PackageKit logic leaves me additionally puzzled because I would expect that code to result in a double install_root prefix. I'm unsure what's the trick with regards to that, but I left a comment at PackageKit/PackageKit#369 (comment) with my understanding of it.

@lucab
Copy link
Contributor

lucab commented May 4, 2022

#3652 (already merged) now carries a workaround for both install and refresh-md flows, so this can be closed.

Once the libdnf ticket will be addressed too, we could bump the submodule and remove our workaround in favor of a long-term fix.

@lucab lucab closed this as completed May 4, 2022
lucab added a commit to lucab/libdnf that referenced this issue May 4, 2022
This drops the internal prefixing logic for vars directories, thus
avoiding the prepended `install_root` path prefix on all entries.
It also results in aligning the behaviors of `dnf_context_set_vars_dir()`
and `dnf_context_set_repos_dir()`.

Existing consumers in the wild are not really expecting this kind
of prefixing to happen internally. Additionally, developers seem to
naturally assume that the lookalike APIs for repos-dir and vars-dir
behave in the same way.

Ref: coreos/rpm-ostree#3241
Ref: https://github.com/PackageKit/PackageKit/pull/369/files#r863937752
Closes: rpm-software-management#1503
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
jira for syncing to jira
Projects
None yet
Development

No branches or pull requests

6 participants